**Orna Kupferman Pawel Sobocinski (Eds.)**

# **Foundations of Software Science and Computation Structures**

**26th International Conference, FoSSaCS 2023 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023 Paris, France, April 22–27, 2023 Proceedings**

## Lecture Notes in Computer Science 13992

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

## Editorial Board Members

Elisa Bertino, USA Wen Gao, China

Bernhard Steffen , Germany Moti Yung , USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at https://link.springer.com/bookseries/558

Orna Kupferman • Pawel Sobocinski Editors

# Foundations of Software Science and Computation Structures

26th International Conference, FoSSaCS 2023 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023 Paris, France, April 22–27, 2023 Proceedings

Editors Orna Kupferman The Hebrew University of Jerusalem Jerusalem, Israel

Pawel Sobocinski Tallinn University of Technology Tallinn, Estonia

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-30828-4 ISBN 978-3-031-30829-1 (eBook) https://doi.org/10.1007/978-3-031-30829-1

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the 26th ETAPS! ETAPS 2023 took place in Paris, the beautiful capital of France. ETAPS 2023 was the 26th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming languages, analysis tools, and formal approaches to software engineering. Organising these conferences in a coherent, highly synchronized conference programme enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe.

ETAPS 2023 received 361 submissions in total, 124 of which were accepted, yielding an overall acceptance rate of 34.3%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2023 featured the unifying invited speakers Véronique Cortier (CNRS, LORIA laboratory, France) and Thomas A. Henzinger (Institute of Science and Technology, Austria) and the conference-specific invited speakers Mooly Sagiv (Tel Aviv University, Israel) for ESOP and Sven Apel (Saarland University, Germany) for FASE. Invited tutorials were provided by Ana-Lucia Varbanescu (University of Twente and University of Amsterdam, The Netherlands) on heterogeneous computing and Joost-Pieter Katoen (RWTH Aachen, Germany and University of Twente, The Netherlands) on probabilistic programming.

As part of the programme we had the second edition of TOOLympics, an event to celebrate the achievements of the various competitions or comparative evaluations in the field of ETAPS.

ETAPS 2023 was organized jointly by Sorbonne Université and Université Sorbonne Paris Nord. Sorbonne Université (SU) is a multidisciplinary, research-intensive and worldclass academic institution. It was created in 2018 as the merge of two first-class research-intensive universities, UPMC (Université Pierre and Marie Curie) and Paris-Sorbonne. SU has three faculties: humanities, medicine, and 55,600 students (4,700 PhD students; 10,200 international students), 6,400 teachers, professor-researchers and 3,600 administrative and technical staff members. Université Sorbonne Paris Nord is one of the thirteen universities that succeeded the University of Paris in 1968. It is a major teaching and research center located in the north of Paris. It has five campuses, spread over the two departments of Seine-Saint-Denis and Val d'Oise: Villetaneuse, Bobigny, Saint-Denis, the Plaine Saint-Denis and Argenteuil. The university has more than 25,000 students in different fields, such as health, medicine, languages, humanities, and science. The local organization team consisted of Fabrice Kordon (general co-chair), Laure Petrucci (general co-chair), Benedikt Bollig (workshops), Stefan Haar (workshops), Étienne André (proceedings and tutorials), Céline Ghibaudo (sponsoring), Denis Poitrenaud (web), Stefan Schwoon (web), Benoît Barbot (publicity), Nathalie Sznajder (publicity), Anne-Marie Reytier (communication), Hélène Pétridis (finance) and Véronique Criart (finance).

ETAPS 2023 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), EASST (European Association of Software Science and Technology), Lip6 (Laboratoire d'Informatique de Paris 6), LIPN (Laboratoire d'informatique de Paris Nord), Sorbonne Université, Université Sorbonne Paris Nord, CNRS (Centre national de la recherche scientifique), CEA (Commissariat à l'énergie atomique et aux énergies alternatives), LMF (Laboratoire méthodes formelles), and Inria (Institut national de recherche en informatique et en automatique).

The ETAPS Steering Committee consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň (Prague), Barbara König (Duisburg), Thomas Noll (Aachen), Caterina Urban (Inria), Jan Křetínský (Munich), and Lenore Zuck (Chicago).

Other members of the steering committee are: Dirk Beyer (Munich), Luís Caires (Lisboa), Ana Cavalcanti (York), Bernd Finkbeiner (Saarland), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovács (Vienna), Orna Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Elizabeth Polgreen (Edinburgh), Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella (Edinburgh), Natasha Sharygina (Lugano), Pawel Sobocinski (Tallinn), Sebastián Uchitel (London and Buenos Aires), Andrzej Wasowski (Copenhagen), Stephanie Weirich (Pennsylvania), Thomas Wies (New York), Anton Wijs (Eindhoven), and James Worrell (Oxford).

I would like to take this opportunity to thank all authors, keynote speakers, attendees, organizers of the satellite workshops, and Springer-Verlag GmbH for their support. I hope you all enjoyed ETAPS 2023.

Finally, a big thanks to Laure and Fabrice and their local organization team for all their enormous efforts to make ETAPS a fantastic event.

April 2023 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

## Preface

This volume contains the papers presented at the 26th International Conference on Foundations of Software Science and Computation Structures (FoSSaCS 2023), which was held 24–27 April, 2023, in Paris, France. The conference is dedicated to foundational research with a clear significance for software science and brings together research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems.

The program consisted of 26 contributed papers, selected from among 85 submissions. Each submission was assessed by three or more Program Committee members. The conference management system EasyChair was used to handle the submissions, to conduct the electronic Program Committee discussions, and to assist with the assembly of the proceedings.

We wish to thank all the authors who submitted papers for consideration, the members of the Program Committee for their conscientious work, and all additional reviewers who assisted the Program Committee in the evaluation process. Finally, we would like to thank the ETAPS organization for providing an excellent environment for FoSSaCS, other conferences, and workshops.

February 2023 Orna Kupferman Pawel Sobocinski

## Organization

## Program Committee

Nathalie Bertrand Inria, France Thomas Colcombet CNRS, France Assia Mahboubi Inria, France

Parosh Aziz Abdulla Uppsala University, Sweden Giovanni Bacci Aalborg University, Denmark Patrick Baillot CNRS and Université de Lille, France Lars Birkedal Aarhus University, Denmark Véronique Bruyère University of Mons, Belgium Marco Carbone IT University of Copenhagen, Denmark Ugo Dal Lago Università di Bologna, Italy and Inria Sophia Antipolis, France Emmanuel Filiot Université libre de Bruxelles, Belgium Marco Gaboardi Boston University, USA Bart Jacobs Radboud University, The Netherlands Bartek Klin University of Oxford, UK Orna Kupferman Hebrew University of Jerusalem, Israel Barbara König University of Duisburg-Essen, Germany Shahar Maoz Tel Aviv University, Israel Kuldeep S. Meel National University of Singapore, Singapore Stefan Milius FAU Erlangen, Germany Filip Murlak University of Warsaw, Poland Koko Muroya RIMS, Kyoto University, Japan Joel Ouaknine Max Planck Institute for Software Systems, Germany Alexandra Silva University College London, UK Pawel Sobocinski Tallinn University of Technology, Estonia Sam Staton University of Oxford, UK Alwen Tiu Australian National University, Australia Frank Valencia LIX, Ecole Polytechnique, France Daniele Varacca LACL - Université Paris Est Créteil, France

## Additional Reviewers

Aguirre, Alejandro Akshay, S. Aranda, Jesus Arsiwalla, Xerxes Asada, Kazuyuki Aubert, Clément Bacci, Giorgio Bahr, Patrick Balachander, Mrudula Balaji, Nikhil Balasubramanian, A. R. Baldan, Paolo Bansal, Suguman Barbarossa, Davide Basold, Henning Benerecetti, Massimo Bengtson, Jesper Bernardi, Giovanni Boker, Udi Bonchi, Filippo Brice, Léonard Béal, Marie-Pierre Casares, Antonio Castiglioni, Valentina Chockler, Hana Chroboczek, Juliusz Clairambault, Pierre Clemente, Lorenzo Clouston, Ranald Cohen, Liron Corbyn, Nathan Corradini, Andrea Danielsson, Nils Anders Dantchev, Stefan de Groot, Jim de Vilhena, Paulo Dell'Erba, Daniele Demangeon, Romain Dima, Catalin Dragoi, Cezara Dubut, Jérémy Fahrenberg, Uli Feier, Cristina

Fijalkow, Nathanaël Finster, Eric Fiterau-Brostean, Paul Freund, Anton Ganty, Pierre Gavazzo, Francesco Geeraerts, Gilles Ghyselen, Alexis Goy, Alexandre Gratzer, Daniel Guilmant, Quentin Gurke, Sebastian Gutierrez, Julian Hadzihasanovic, Amar Hamel-de Le Court, Edwin Hansen, Helle Hvid Helouet, Loic Henry, Léo Hirschowitz, Tom Hofman, Piotr Hou, Zhe Jaber, Guilhem Jaquard, Arthur Jindal, Gorav Jonsson, Bengt Kappé, Tobias Karimov, Toghrul Kavvos, Alex Kelmendi, Edon Kerjean, Marie Kopczynski, Eryk Kruckman, Alex Lebeda, Christian Janos Li, Yong Lucyshyn-Wright, Rory Luttik, Bas Main, James C. A. Marin, Sonia Markey, Nicolas Mascle, Corto Mathur, Umang Mazza, Damiano McKenzie, Pierre

Michaliszyn, Jakub Michaux, Christian Mimram, Samuel Morales Elena, Marianela Nieuwveld, Joris Niewerth, Matthias Niwinski, Damian Norrish, Michael Nuyts, Andreas Olarte, Carlos Oliva, Paulo Pagani, Michele Patterson, Evan Perez, Guillermo Piedeleu, Robin Pinzón, Carlos Pommellet, Adrien Pous, Damien Pradic, Pierre Praveen, M. Purser, David Ramírez, Sergio Raskin, Jean-Francois Reynouard, Raphaël Riba, Colin Román, Mario Rossberg, Andreas Rot, Jurriaan Saivasan, Prakash Sakayori, Ken Sanan, David Sangnier, Arnaud Sankur, Ocan Schmid, Todd Schmitz, Sylvain

Shevrin, Ilia Shillito, Ian Shirmohammadi, Mahsa Skrzypczak, Michał Sokolova, Ana Spies, Simon Stefanesco, Leo Stefański, Rafał Stein, Dario Sterling, Jonathan Totzke, Patrick Traytel, Dmitriy Tsampas, Stelios Tsukada, Takeshi Ulrik, Nikolaj Jensen Urbat, Henning Vahanwala, Mihir van der Weide, Niels van Dijk, Tom van Glabbeek, Rob van Gool, Sam Vandenhove, Pierre Vignudelli, Valeria Vilmart, Renaud Vákár, Matthijs Wagemaker, Jana Wang, Di Weininger, Maximilian Winskel, Glynn Winter, Sarah Wißmann, Thorsten Worrell, James Yamakami, Tomoyuki Yatapanage, Nisansala

## Contents



## When Programs Have to Watch Paint Dry

Danel Ahmanp q

Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia danel.ahman@fmf.uni-lj.si

Abstract. We explore type systems and programming abstractions for the safe usage of resources. In particular, we investigate how to use types to modularly specify and check *when* programs are allowed to use their resources, e.g., when programming a robot arm on a production line, it is crucial that painted parts are given enough time to dry before assembly. We capture such *temporal resources* using a time-graded variant of Fitch-style modal type systems, develop a corresponding modally typed, effectful core calculus, and equip it with a graded-monadic denotational semantics illustrated by a concrete presheaf model. Our calculus also includes graded algebraic effects and effect handlers. They are given a novel temporally aware treatment in which operations' specifications include their execution times and their continuations know that an operation's worth of additional time has passed before they start executing, making it possible to safely access further temporal resources in them.

Keywords: Temporal resources · Modal types · Graded monads · Algebraic effects · Effect handlers.

## 1 Introduction

The correct usage of resources is at the heart of many programs, especially if they control safety-critical machinery. Such resources can take many different forms: ensuring that file handles are not arbitrarily duplicated or discarded (as captured by linear and uniqueness types) [11,25,40], or guaranteeing that communication happens according to protocols (as specified by session types) [30,70], or controlling how data is laid out in memory (as in Hoare and separation logics) [2,34,56,64], or assuring that resources are correctly finalised [1,43].

In contrast to the above approaches that predominantly focus on *how* resources are used, we study how to modularly specify and verify *when* programs can use their resources—we call such resources *temporal*. For instance, consider the following code snippet controlling a robot arm on a (car) production line:

let pbody', left-door', right-door'q " paint pbody, left-door, right-doorq in assemble pbody', left-door', right-door'q

Here, the correct execution of the program (and thus operation of the robot arm it is controlling) relies on the car parts given enough time to dry between painting and assembly. Therefore, in its current form, the above code is correct only if a compiler (or a scheduler) inserts enough of a time delay at compile time (resp. dynamically blocks program's execution for enough time) between the calls to paint and assemble. However, in either case, one still faces the question of how to reason about the correctness of the compiled code (resp. dynamic checks).

In this paper, we focus on developing a type system based means for reasoning about the temporal correctness of the code that the above-mentioned compiler might produce, or that a programmer might write directly when full control of the code is important. In particular, we had *three desiderata* we set out to fulfil:


Paper Structure We achieve these goals by designing a *mathematically natural core programming language* for safe and correct programming with temporal resources: on the one hand, based on a time-graded, temporal variant of *Fitch-style modal type systems* [19,27], and on the other hand, on *graded monads* [35,51,67].

We review modal types and discuss how we use them to capture temporal resources in §2. In §3, we present λrτs—our modally typed, effectful, equationally presented core calculus for safe programming with temporal resources. We justify the design of λrτ<sup>s</sup> by giving it a mathematically natural sound denotational semantics in §4, based on graded monads and adjunctions between strong monoidal functors, including a concrete presheaf example. In §5, we briefly discuss a specialisation of λrτ<sup>s</sup> with equations for time delays. We review related work and remark on future work in §6, and conclude in §7. This paper is also accompanied by an online appendix (https://arxiv.org/abs/2210.07738) that presents further details of renamings and denotational semantics that we omit in §3 and §4.

For supplementary rigour, we have formalised the main results of §3 and §4 also in Agda [68], available at https://github.com/danelahman/temporalresources/releases/tag/fossacs2023. Regrettably, it currently lacks (i) proofs of some auxiliary lemmas noted in Prop. 4 due to a bug in Agda where withabstractions produce ill-typed terms,<sup>1</sup> and (ii) two laws of the presheaf model because unfolding of definitions produces unmanageably large terms for Agda.

## 2 Modal Types for Temporal Resources

We begin with an overview of (Fitch-style) modal type systems and how a timegraded variant of them naturally captures temporal aspects of resources.

<sup>1</sup> Eta-contraction is not type-preserving: https://github.com/agda/agda/issues/2732

#### 2.1 (Fitch-Style) Modal Types

A *modal type system* extends the types of an underlying type system with new *modal type formers*, <sup>2</sup> e.g.,˝X, which states that the type is to be considered and reasoned about in a *different mode* compared to X, which can take many forms. For instance, in Kripke's possible worlds semantics, ˝<sup>X</sup> means that values of type X are *available in all future worlds* [41]; in run-time code generation, the type ˝<sup>X</sup> captures *generators of* <sup>X</sup>*-typed code* [72]; and in asynchronous and distributed programming, the type ˝<sup>X</sup> specifies *mobile* <sup>X</sup>*-typed values* [3,54,63].

Many different approaches to presenting modal type systems have been developed, with one of the main culprits being the difficulty of getting the *introduction rule* for ˝<sup>X</sup> correct. Namely, bearing in mind Kripke's possible worlds semantics, the introduction rule for ˝<sup>X</sup> must allow one to use only those hypotheses that also hold in all future worlds, while at the same time ensuring that the system still enjoys expected structural properties. Solutions to this problem have involved proving ˝<sup>X</sup> in a context containing only ˝-types [62] (with a failure of structural properties in the naive approaches), or building a form of explicit substitutions into the introduction rule for ˝<sup>X</sup> to give the rule premise access to only ˝-types [12], or incorporating the Kripke semantics in the type system by explicitly indexing types with worlds [66]—see [37] for an in-depth survey.

In this paper, we build on *Fitch-style modal type systems* [15,19,27,48], where the typing rules for ˝<sup>X</sup> are given with respect to another modality, - , that acts on contexts, resulting in a particularly pleasant type-theoretic presentation.

As an illustrative example, in a Fitch-style modal type system corresponding to the modal logic S4 (whose Kripke models require the order on worlds to be reflexive and transitive, thus also corresponding to natural properties of time), the typing rules for variables and the ˝<sup>X</sup> type have the following form:<sup>3</sup>

$$\begin{array}{c} \stackrel{\text{VAR}}{\longrightarrow} \mathcal{C} \notin \varGamma'\\ \stackrel{\text{\tiny{\tiny{\text{U}}}}{\Gamma, x:X, \Gamma' \vdash x:X}} \end{array} \qquad \begin{array}{c} \stackrel{\text{S\_{\text{HUT}}}{\longrightarrow} \mathcal{C} \text{-} t:X} \\ \stackrel{\text{\tiny{\tiny{\text{O}}}}{\Gamma \vdash t \text{ \text{s}} \text{t} : \square X} \end{array} \qquad \begin{array}{c} \stackrel{\text{\tiny{\text{O}}}{\longrightarrow} t: \square X} \\ \stackrel{\text{\tiny{\text{O}}}{\Gamma \vdash t \text{ \text{ }} \square X} \end{array}$$

Intuitively, the *context modality* creates a barrier in the premise of Shut so that only ˝-typed variables can be used from <sup>Γ</sup> in <sup>t</sup>, achieving the abovementioned correctness goal for the introduction rule of ˝X. Alternatively, in the context of Kripke's possible worlds semantics, one can also read the occurrences of the modality as advancing the underlying world—in Shut, t in the premise is typed in some future world compared to where shut t is typed at. This intuition will be useful to how we use a similar modality to capture the passage of time in <sup>λ</sup>rτs. The context weakening Γ, Γ<sup>1</sup> in Open ensures the admissibility of structural rules, and in the possible worlds reading, it intuitively expresses that if ˝<sup>X</sup> is available in some world, then X will be available in all possible future worlds.

<sup>2</sup> For brevity, we use the term *modal type system* to interchangeably refer to both modal type systems and natural deduction systems of (intuitionistic) modal logics.

<sup>3</sup> Depending on which exact modal logic one is trying to capture, the form of contexts used in the introduction/elimination rules can differ, see [19] for a detailed overview.

## 2.2 Modal Types for Temporal Resources

Next, we give a high-level overview of how we use a time-graded variant of Fitchstyle modal type systems to capture temporal properties of resources in λrτs. For this, we use the production line code snippet from §1 as a working example.

A Naive Approach Before turning to modal types, a naive solution to achieve the desired time delay would be for paint to return the required drying time and for the program to delay execution for that time duration, e.g., as expressed in

let pτdry, body', left-door', right-door'q " paint pbody, left-door, right-doorq in delay τdry; assemble pbody', left-door', right-door'q

It is not difficult to see that we could generalise this solution to allow performing other useful activities while waiting for τdry time to pass. So are we done and can we conclude the paper here? Well, no, because this solution puts all the burden for writing correct code on the shoulders of the programmer, with successful typechecking giving no additional guarantees that τdry indeed will have passed.

A Temporal Resource Type Instead, inspired by Fitch-style modal type systems and Kripke's possible worlds semantics of the ˝-modality, we propose a *temporal resource type*, written rτ s X, to specify that a value of type X will become available for use in *at most* τ time units, or to put it differently, the boxed value of type X can be explicitly unboxed only when *at least* τ time units have passed. Concretely, rτ s X is presented by the following two typing rules:

Box Γ, xτ y \$ V : X Γ \$ box<sup>τ</sup> V : rτ s X Unbox τ ď time Γ Γ ´ τ \$ V : rτ s X Γ, x : X \$ N : Y ! τ <sup>1</sup> Γ \$ unbox<sup>τ</sup> V as x in N : Y ! τ <sup>1</sup>

Above, τ s are natural numbers that count discrete time moments, and Y ! τ <sup>1</sup> is a type of computations returning Y -typed values and executing in τ <sup>1</sup> time units.

Analogously to the context modality of Fitch-style modal type systems, we introduce a similar *modality on contexts*, written xτ y, to express that when typechecking a term of the form Γ, xτ y \$ V : X, we can safely assume that *at least* τ time will have passed before V is accessed or executed, as in the premise of the Box rule. Accordingly, in Unbox, we require that at least τ time units have passed since the resource V of type rτ s X was created or brought into scope, by typing V in the "earlier" context Γ ´ τ (we define this operation in §3.3).

Encapsulating temporal resources as a type gives us flexible first-class access to them, and allows to pack them in data structures and pass them to functions.

Modelling Passage of Time As we see in the Unbox rule, we can unbox a temporal resource only when enough time has passed since its creation. This begs the question: How can the passage of time be modelled within the type system? For this, we propose a new notion of *temporally aware graded algebraic effects*, where each operation op is specified not only by its parameter and result types, but also by its prescribed execution time, and with op's continuation knowing that op's worth of additional time has passed before it begins executing. We refer the reader to [8,31,35,60] for background on ordinary (graded) algebraic effects.

For instance, the paint operation, taking <sup>τ</sup>paint time, is typed in <sup>λ</sup>rτ<sup>s</sup> as<sup>4</sup>

$$\frac{\begin{array}{c} \Gamma \vdash V : \mathsf{Body} \times \mathsf{Door} \times \mathsf{Door} \\ \Gamma, \langle \tau\_{\mathsf{pair}} \rangle, x : \lceil \tau\_{\mathsf{dry}} \rceil \mathsf{Body} \times \lceil \tau\_{\mathsf{dry}} \rceil \mathsf{Door} \times \lceil \tau\_{\mathsf{dry}} \rceil \mathsf{Door} \vdash M : X \,\mathsf{T} \end{array}}{\begin{array}{c} \Gamma \vdash \mathsf{paint} \; V \; \langle x . M \rangle : X \,\mathsf{fpaint} + \tau \end{array}}$$

Here, xτpainty expresses that from the perspective of any unboxes in M, an *additional* τpaint *time* will have passed compared to the beginning of the execution of paint V px.Mq, which is typed in the "earlier" context Γ. Also, observe that paint's result x is available *after* τpaint time has passed (i.e., after paint finishes), and its type has the car part types wrapped as temporal resources, ensuring that any further operations (e.g., assemble) can access them only after *at least* τdry time has passed *after* paint finishes. The delay τ operation is typed analogously.

Finally, similarly to algebraic operations, we also use the context modality xτ y to model the passage of time in sequential composition, as specified in

$$\frac{\Gamma \vdash M : X \mathrel{!\tau} \qquad \Gamma, \langle \tau \rangle, x : X \vdash N : Y \mathrel{!\tau'}}{\Gamma \vdash \text{let } x = M \text{ in } N : Y \mathrel{!\tau} + \tau'}$$

The type X ! τ (for specifying the execution time of computations) is standard from graded monads style effect systems [35]. The novelty of our work is to use this effect information to inform continuations that they can safely assume that the given amount of additional time has passed before they start executing.

Putting It All Together We conclude this overview by revisiting the production line code snippet and note that in the λrτs-calculus we can write it as

let pbody', left-door', right-door'q " paint pbody, left-door, right-doorq in delay τdry; unbox body' as body" in unbox left-door' as left-door" in unbox right-door' as right-door" in assemble pbody", left-door", right-door"q

Observe that apart from the unbox operations, the code looks identical to the naive, unsafe solution discussed earlier. However, crucially, now any code that wants to use the outputs of paint will typecheck only if these resources are accessed after at least τdry time units have passed after paint finishes. In the code snippet, this is achieved by blocking execution with delay τdry for τdry time units, but this could have been equally well achieved by executing other useful operations op1; ... ; opn, as long as they collectively take at least τdry time.

<sup>4</sup> We present λrτ<sup>s</sup> formally using algebraic operations with explicit continuations, while in code snippets we use so-called *generic effects* [59] without explicit continuations.

Time grade: τ P **N** Ground type A, B, C ::" b ˇ <sup>ˇ</sup> unit <sup>ˇ</sup> <sup>ˇ</sup> <sup>A</sup> <sup>ˆ</sup> <sup>B</sup> <sup>ˇ</sup> <sup>ˇ</sup> <sup>r</sup><sup>τ</sup> <sup>s</sup> <sup>A</sup> Value type X, Y , Z ::" A ˇ <sup>ˇ</sup> <sup>X</sup> <sup>ˆ</sup> <sup>Y</sup> <sup>ˇ</sup> <sup>ˇ</sup> <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> ! <sup>τ</sup> <sup>ˇ</sup> <sup>ˇ</sup> <sup>r</sup><sup>τ</sup> <sup>s</sup> <sup>X</sup> Computation type: X ! τ

Fig. 1. Types of λrτs.

## 3 A Calculus for Programming with Temporal Resources

We now recast the ideas explained above as a formal, modally typed, effectful core calculus, called λrτs. We base it on the fine-grain call-by-value λ-calculus [44].

#### 3.1 Types

The types of λrτ<sup>s</sup> are given in Fig. 1. *Ground types* include base types b, and are closed under finite products and the modal *temporal resource type* rτ s A. The latter denotes that an A-typed value will become available in *at most* τ time units, where τ P **N** counts discrete time moments.<sup>5</sup> The ground types can also come with *constants* f with associated *constant signatures* f : pA1,...,Anq Ñ B.

To model operations such as paint and assemble discussed in §2.2, we assume a set of *operations symbols* O, with each op P O assigned an *operation signature* op : Aop - Bop ! τop, which specifies that op accepts inputs of type Aop, returns values of type Bop, and its execution takes τop time units. Observe that by typing operations with ground types, as opposed to simply with base types, we can specify operations such as paint : Part prτdrys Partq!τpaint, returning values that can be accessed only after a certain amount of time, here, after τdry.

*Value types* extend ground types with *function type* X Ñ Y ! τ that specifies functions taking X-typed arguments to computations that return Y -typed values and take τ time to execute, as expressed by the *computation type* Y ! τ .

#### 3.2 Terms

The syntax of terms is given in Fig. 2, separated into values and computations.

*Values* include variables, constants, finite tuples, functions, and the *boxing up of temporal resources*, box<sup>τ</sup> V , which allows us to consider an arbitrary value V as a temporal resource as long as it is safe to access V after τ time units.

*Computations* include returning values, sequential composition, function application, pattern-matching<sup>6</sup>, algebraic operation calls, effect handling, and the *unboxing of temporal resources*, where given a temporal resource V of type rτ s X, <sup>5</sup> For concreteness, we work with <sup>p</sup>**N**, <sup>0</sup>, `, ´<sup>9</sup> , ďq for time grades, but we do not foresee

problems generalising these to come from other analogous algebraic structures.

<sup>6</sup> The form let <sup>p</sup>x, y, zq " <sup>M</sup> in <sup>N</sup> in §1,2 is the natural combination of let and match.


Fig. 2. Values, computations, and effect handlers of λrτs.

the computation unbox<sup>τ</sup> V as x in N is used to access the underlying value of type X if at least τ time units have passed since the creation of the resource V .

In addition to user-specifiable operation calls (via operation signatures and effect handling), we include a separate delay τ M operation that blocks the execution of its continuation for the given amount of time. For simplicity, we require effect handlers to have *operation clauses* Mop for all op P O, but we do not allow delays to be handled in light of the equations we want of them in §5, where all consecutive delays are collapsed and all zero-delays are removed.

#### 3.3 Type System

We now equip λrτ<sup>s</sup> with a modal type-and-effect system. On the one hand, for modelling temporal resources, we build on Fitch-style modal type systems [19]. On the other hand, for modelling effectful computations and their specifications, we build on type-and-effect systems for calculi based on graded monads [35].

The *typing judgements* are written as Γ \$ V : X and Γ \$ M : X ! τ , where τ specifies M's execution time and Γ is a *temporal typing context*, given by

$$
\Gamma \implies \cdot \mid \quad \Gamma, x:X \mid \quad \Gamma, \langle \tau \rangle \cdot
$$

Here, xτ y is a *temporal context modality*, akin to in Fitch-style systems. We use it to express that when typechecking a term of the form Γ, xτ y \$ V : X, we

#### Values


Fig. 3. Typing rules of λrτs.

can safely assume that *at least* τ time will have passed before the resource V is accessed or executed. The *rules* defining these judgements are given in Fig. 3.

In contrast to Fitch-style modal type systems discussed in §2.1, Var does not restrict the Γ<sup>1</sup> right of x to not include any context modalities. This is so because in the possible worlds reading of λrτ<sup>s</sup> (see §4) we treat all types as being monotone for time—this is not usually the case for formulae in modal logics such as S4, but in λrτ<sup>s</sup> this models that once any value is available it will remain so.

As in systems based on graded monads, Return specifies that returning a value takes zero time, and Let that the execution time of sequentially composed computations is the sum of the individual ones. Novel to <sup>λ</sup>rτs, Let, Op, Delay, and Handle state that the continuations can safely assume that relevant amount of additional time has passed before they start executing, as discussed in §2.2.

When typing the operation clauses Mop in Handle, we universally quantify (at the meta-level) over the execution time τ <sup>2</sup> of the continuation k of Mop. We do so as the operation clauses Mop must be able to execute at any point when effect handling recursively traverses M. Further, observe that k is wrapped inside a resource type. This ensures that k is invoked only after τop amount of time has been spent in Mop, thus guaranteeing that the temporal discipline is respected. Note that this enforces a *linear* discipline for our effect handlers: for τop ą 0, k must be executed exactly once for Mop's execution time to match τop ` τ <sup>2</sup>.

Finally, Box specifies that in order to box up a value V of type X as a temporal resource of type rτ s X, we must be able to type V when assuming that τ additional time units will have passed before V is accessed. At the same time, Unbox specifies that we can unbox a temporal resource <sup>V</sup> of type <sup>r</sup><sup>τ</sup> <sup>s</sup> <sup>X</sup> only if at least τ time units have passed since its creation: the time captured by Γ must be at least τ , and we must be able to type V in a τ time units "earlier" context Γ ´ τ . The *time captured by a context*, time Γ, is calculated recursively as

time ¨ def " <sup>0</sup> time <sup>p</sup>Γ, x : <sup>X</sup><sup>q</sup> def " time <sup>Γ</sup> time <sup>p</sup>Γ, <sup>x</sup><sup>τ</sup> yq def " time Γ ` τ

and the *"time travelling" operation* Γ ´ τ as (where τ` " 1 ` τ <sup>2</sup> for some τ <sup>2</sup>)

$$\begin{array}{ll} \text{\(\pi\)} \quad \text{\(\pi\)} \quad \text{\(\pi\)} \quad \text{\(\pi\)} \quad \text{\(\pi\)} \quad \text{\(\pi\)} \quad \text{\(\pi\)} \\\ \Gamma - 0 \stackrel{\text{\(\pi\)}}{=} \Gamma \qquad \text{\(\pi\)} \quad \text{\(\pi\)} \quad \text{\(\Gamma, x:X\)} - \tau\_{+} \stackrel{\text{\(af\)}}{=} \Gamma - \tau\_{+} \\\ \Gamma(\Gamma, \langle \tau' \rangle) - \tau\_{+} \stackrel{\text{\(af\)}}{=} \text{\(if } \tau\_{+} \leqslant \tau' \text{ then } \Gamma, \langle \tau' \dashrightarrow \tau\_{+} \rangle \text{ \(\text{else } \Gamma - \langle \tau\_{+} \dashrightarrow \tau' \rangle\)} \end{array}$$

taking Γ to an "earlier" state by removing τ worth of modalities and variables.

#### 3.4 Admissibility of Renamings and Substitutions

We now show that expected *structural* and *substitution rules* [7] are admissible.

Theorem 1. *The typing relations* Γ \$ V : X *and* Γ \$ M : X ! τ *are closed under standard structural rules of weakening, exchange of consecutive variables, and contraction (omitted here). Furthermore, both typing relations are also closed under rules making* x´y *into a strong monoidal functor (with a co-strength) [45]:*

$$\frac{\Gamma, \langle 0 \rangle \vdash J}{\Gamma \vdash J} \quad \frac{\Gamma, \langle \tau\_1 + \tau\_2 \rangle \vdash J}{\Gamma, \langle \tau\_1 \rangle, \langle \tau\_2 \rangle \vdash J} \quad \frac{\Gamma, \langle \tau \rangle \vdash J \quad \tau \leqslant \tau'}{\Gamma, \langle \tau' \rangle \vdash J} \quad \frac{\Gamma, \langle \tau \rangle, x: X \vdash J}{\Gamma, x: X, \langle \tau \rangle \vdash J}$$

*where* Γ \$ J *ranges over both typing relations, where the first two rules hold in both directions, and the last rule expresses that if we can type* J *using a variable "now", we can also type* J *if that variable was brought into scope "earlier".*

*Proof.* First, we define a *renaming relation* ρ : Γ - Γ<sup>1</sup> , and then prove by induction that if Γ \$ J and ρ : Γ - Γ<sup>1</sup> then Γ<sup>1</sup> \$ Jrρs, where Jrρs is J renamed with ρ. The relation is defined as the reflexive-transitive-congruent closure of rules corresponding to the desired structural rules, e.g., var<sup>r</sup> <sup>x</sup>:XP<sup>Γ</sup> : Γ, y : X - Γ and μ<sup>r</sup> : Γ, xτ<sup>1</sup> ` τ2y -Γ, xτ1y, xτ2y. The full list is given in the online appendix.

For the Var and Unbox cases of the proof, we show that if ρ : Γ - Γ<sup>1</sup> and x P<sup>τ</sup> Γ, then ρ x P<sup>τ</sup><sup>1</sup> Γ<sup>1</sup> for some τ <sup>1</sup> with τ ď τ <sup>1</sup> , where x P<sup>τ</sup> Γ means that x P Γ and there is τ worth of modalities right of x in Γ, and ρ x is the variable that ρ maps x to. For Unbox, we further prove that if ρ : Γ - Γ<sup>1</sup> , then for any τ

we can build ρ ´ τ : Γ ´ τ - Γ<sup>1</sup> ´ τ , using the result about P<sup>τ</sup> to ensure that ρ does not map any x P Γ ´ τ outside of Γ<sup>1</sup> ´ τ . We also establish that if Γ - Γ1 , then time Γ ď time Γ<sup>1</sup> , allowing us to deduce τ ď time Γ<sup>1</sup> from τ ď time Γ.

The admissibility of the rules corresponding to μ<sup>r</sup> (and its inverse) relies on us having defined context splitting in Unbox using <sup>Γ</sup> ´ <sup>τ</sup> , as opposed to more rigidly as Γ, Γ<sup>1</sup> , as in [19], as then it would be problematic if the split happens between xτ1y, xτ2y. Inverses of the last two rules in Thm. 1 are not valid—they would allow unboxing temporal resources without enough time having passed.

Theorem 2. *The typing relations* Γ \$ V : X *and* Γ \$ M : X ! τ *are closed under substitution, i.e., if* Γ, x : X, Γ<sup>1</sup> \$ J *and* Γ \$ W : X*, then* Γ, Γ<sup>1</sup> \$ JrW{xs*, where* JrW{xs *is standard recursively defined capture-avoiding substitution [7].*

*Proof.* The proof proceeds by induction on the derivation of Γ, x : X, Γ<sup>1</sup> \$ J. The most involved case is Unbox, where we construct the derivation of Γ, Γ<sup>1</sup> \$ unbox<sup>τ</sup> V rW{xs as y in NrW{xs : Y ! τ <sup>1</sup> by first analysing whether τ ď time Γ<sup>1</sup> , which tells us whether x is in the context pΓ, x : X, Γ<sup>1</sup> q´τ of V , based on which we learn whether W continues to be substituted for x in V or whether V rW{xs " V .

## 3.5 Equational Theory

We conclude the definition of λrτ<sup>s</sup> by equipping it with an *equational theory* to reason about program equivalence, defined using judgements Γ \$ V " W : X and Γ \$ M " N : X ! τ , where we presuppose that the terms are well-typed for the given contexts and types. The rules defining these relations are given in Fig. 4. We omit standard equivalence, congruence, and substitutivity rules [7].

The equational theory consists of standard β{η-equations for the unit, product, and function types. We also include monadic equations for return and let [52]. For op and delay, we include algebraicity equations, allowing us to pull them out of let [8]. For handle, we include equations expressing that effect handling recursively traverses a term, replacing each op-occurrence with the operation clause Mop, leaving delays untouched, and finally executes the continuation N when reaching return values [61]. Finally, we include β/η-equations for box and unbox, expressing that unbox behaves as a pattern-matching elimination form for box.

## 4 Denotational Semantics

We justify the design of λrτ<sup>s</sup> by giving it a mathematically natural semantics based on *adjunctions between strong monoidal functors* [45] (modelling modalities) and a *strong*<sup>7</sup> *graded monad* [35] (modelling computations). We assume general knowledge of category theory, only spelling out details specific to λrτs. To optimise for space, we discuss the abstract model structure simultaneously with a concrete example using presheaves [46], but note that the interpretation is defined, and its soundness proved, with respect to the abstract structure.

<sup>7</sup> To be more specific, we use a modal notion of r´s*-strength* that we define below.

$$() = V: \mathtt{unit} \tag{\eta}$$

$$\text{fun } (x:X) \mapsto Vx \equiv V:X \to Y \mathrel{!} \tau \tag{\eta}$$

$$(\text{fun } (x : X) \twoheadrightarrow M) \, V \equiv M \, [V/x] \tag{\beta}$$

$$\text{match}\ (V, W) \text{ with } \{(x, y) \mapsto N\} \equiv N[V/x, W/y] \tag{\beta}$$

$$\text{match } V \text{ with } \{(x, y) \mapsto N[(x, y)/z] \} \equiv N[V/z] \tag{\eta}$$

$$\text{let } x = \text{(return } V \text{) in } N = N[V/x] \tag{\beta}$$

$$\text{let } y = \text{(let } x = M \text{ in } N\text{) in } P \equiv \text{let } x = M \text{ in } \text{(let } y = N \text{ in } P\text{)}\tag{\beta}$$

$$\text{let } x = M \text{ in return } x \equiv M \tag{\eta}$$

$$\text{let } x = \left(\text{op } V \left(y \cdot M\right)\right) \text{ in } N = \text{op } V \text{ (}y \text{.} \text{ let } x = M \text{ in } N\text{)}\tag{\beta}$$

$$\text{let } x = (\text{delay } \tau \ M) \text{ in } N \equiv \text{delay } \tau \text{ (let } x = M \text{ in } N\text{)}\tag{\beta}$$

$$\text{handle (return } V \text{) with } H \text{ to } x \text{ in } N \equiv N[V/x] \tag{\beta}$$

handle pop V py . Mqq with H to x in N "

$$M\_{\textsf{op}}[V/x, \textsf{box}\_{\tau\_{\textsf{op}}} \left( \text{fun } (y:B\_{\textsf{op}}) \twoheadrightarrow \textsf{handle} \left. M \text{ with } H \text{ to } x \text{ in } N \right) / k] \tag{\beta}$$

$$\text{Random (delay } \tau \text{ } M \text{) with } H \text{ to } x \text{ in } N \equiv \text{delay } \tau \text{ (handle } M \text{ with } H \text{ to } x \text{ in } N\text{)}\tag{\beta}$$

$$\mathsf{unbox}\_{\tau} \text{ (}\mathsf{box}\_{\tau} V\text{) as } x \text{ in } N \equiv N[V/x] \tag{\beta}$$

$$\mathtt{unbox}\_{\tau} \; V \; \mathtt{as} \; x \; \text{in} \; N[\mathtt{box}\_{\tau} \; x/y] \coloneqq N[V/y] \tag{\eta}$$

Fig. 4. Equational theory of λrτs.

When referring to the *abstract model structure*, we denote the underlying category with **C**. Meanwhile, the *concrete presheaf example* is given in Set<sup>p</sup>**N**,ďq, consisting of functors from p**N**, ďq to the category Set of sets and functions.

The model in Set<sup>p</sup>**N**,ďq is similar to Kripke's possible worlds semantics, except that in Set<sup>p</sup>**N**,ďq all objects are *monotone* for <sup>ď</sup>, i.e., for any <sup>A</sup> <sup>P</sup> Set<sup>p</sup>**N**,ďq we have functions Apt<sup>1</sup> ď t2q : Apt1q Ñ Apt2q respecting reflexivity and transitivity, whereas Kripke models are commonly given by discretely indexed presheaves and only modalities change worlds. For <sup>λ</sup>rτs, working in Set<sup>p</sup>**N**,ďq gives us that when a resource becomes available, it will remain so without need for reboxing, leading to a more natural system for temporal resources and a simpler Var rule.

#### 4.1 Interpretation of Types

Value Types and Contexts To interpret value types, we require the category **C** to have *finite products* p**1**, A ˆ Bq and *exponentials* A ñ B, so as to model the unit, product, and function types. In Set<sup>p</sup>**N**,ďq, the former are given pointwise using the finite products in Set, and the latter are given as <sup>p</sup><sup>A</sup> <sup>ñ</sup> <sup>B</sup>qpt<sup>q</sup> def " Set<sup>p</sup>**N**,ďqphom <sup>t</sup> <sup>ˆ</sup> A, Bq, where hom <sup>t</sup> : <sup>p</sup>**N**, ďq Ñ Set is the covariant *hom-functor* for <sup>p</sup>**N**, ďq, given by hom <sup>t</sup> def " t ď p´q [46]. When unfolding it further, the above means that pA ñ Bqptq is the set of functions pft<sup>1</sup> : Apt 1 q Ñ Bpt 1 qqt1Ptt1P**N**|tďt1<sup>u</sup> that are natural in t 1 , capturing the intuition that in λrτ<sup>s</sup> functions can be applied in any future context. For base types, we require an object rrbss of **C** for each b.

To interpret the temporal resource type, we require a *strong monoidal functor* r´s : p**N**, ďq Ñ r**C**, **C**s, where r**C**, **C**s is the category of endofunctors on **C**. This means that we have functors rτ s : **C** Ñ **C**, for all τ P **N**, together with morphisms rτ<sup>1</sup> ď τ2s<sup>A</sup> : rτ1sA Ñ rτ2sA, natural in A and respecting ď. Strong monoidality of r´s means that we have natural isomorphisms ε<sup>A</sup> : r0sA – Ñ A and δA,τ1,τ<sup>2</sup> : rτ<sup>1</sup> ` τ2sA – Ñ rτ1sprτ2sAq, satisfying time-graded variants of comonad laws [10]:

$$\varepsilon \circ \delta\_{A,0,\tau} \equiv \mathrm{id} \qquad \left[\tau\right](\varepsilon) \circ \delta\_{A,\tau,0} \equiv \mathrm{id} \qquad \delta\_{\left[\tau\_3\right]A,\tau\_1,\tau\_2} \circ \delta\_{A,\tau\_1+\tau\_2,\tau\_3} \equiv \left[\tau\_1\right](\delta) \circ \delta$$

We also require <sup>p</sup>δA,τ1,τ<sup>2</sup> , δ´<sup>1</sup> A,τ1,τ<sup>2</sup> q to be monotone in τ1, τ2, i.e., if τ<sup>1</sup> ď τ <sup>1</sup> <sup>1</sup> and τ<sup>2</sup> ď τ <sup>1</sup> <sup>2</sup>, then rτ <sup>1</sup> <sup>1</sup>sprτ<sup>2</sup> ď τ <sup>1</sup> <sup>2</sup>sq ˝ rτ<sup>1</sup> ď τ <sup>1</sup> <sup>1</sup>s ˝ δ " δ ˝ rτ<sup>1</sup> ` τ<sup>2</sup> ď τ <sup>1</sup> <sup>1</sup> ` τ <sup>1</sup> <sup>2</sup>sA. We omit the indices of the components of natural transformations when convenient.

In Set<sup>p</sup>**N**,ďq, we define pr<sup>τ</sup> <sup>s</sup>Aqpt<sup>q</sup> def " Apt ` τ q, with rτ sA-values given by future <sup>A</sup>-values, and with <sup>p</sup>εA, ε´<sup>1</sup> <sup>A</sup> , δA, δ´<sup>1</sup> <sup>A</sup> q given by identities on A-values, combined with the laws of p0, `q, e.g., as pεAq<sup>t</sup> ` a P pr0sAqptq " Apt ` 0q ˘ def " a P Aptq.

Using the above, we interpret a value type X as an object rrXss of **C**, as

$$\begin{aligned} \left\lbrack \begin{bmatrix} A \\ \end{bmatrix} \stackrel{\text{def}}{=} \left\lbrack \begin{bmatrix} A \\ \end{bmatrix} \right\rbrack^g & \qquad \left\lbrack \begin{bmatrix} \mathsf{unit} \\ \end{bmatrix} \right\rbrack \stackrel{\text{def}}{=} \mathbb{1} & \qquad \left\lbrack \begin{bmatrix} X \times Y \\ \end{bmatrix} \stackrel{\text{def}}{=} \left\lbrack \begin{bmatrix} X \\ \end{bmatrix} \right\rbrack \times \left\lbrack \begin{bmatrix} Y \\ \end{bmatrix} \right\rbrack \\ \left\lbrack \begin{bmatrix} X \to Y \end{bmatrix} \stackrel{\text{def}}{=} \left\lbrack \begin{bmatrix} X \\ \end{bmatrix} \right\rbrack & \qquad \left\lbrack \begin{bmatrix} \tau \end{bmatrix} \begin{bmatrix} X \\ \end{bmatrix} \right\rbrack \stackrel{\text{def}}{=} \left\lbrack \begin{bmatrix} \tau \end{bmatrix} \right\rbrack \end{aligned}$$

where T is a graded monad for modelling computations—we return to it below. The interpretation of ground types rrAss<sup>g</sup> is defined similarly, so we omit it here.

Next, we define the interpretation of contexts, for which we require another *strong monoidal functor*, x´y : p**N**, ďqop Ñ r**C**, **C**s. Note that x´y is *contravariant*—this enables us to model the structural rules that allow terms typed in an earlier context to be used in future ones (see Thm. 1). We denote the strong monoidal structure of x´y with η<sup>A</sup> : A – Ñ x0yA and μA,τ1,τ<sup>2</sup> : xτ1ypxτ2yAq – Ñ xτ<sup>1</sup> `τ2yA, required to satisfy time-graded variants of monad laws [45], given by

$$\mu\_{A,0,\tau} \circ \eta \equiv \text{id} \qquad \mu\_{A,\tau,0} \circ \langle \tau \rangle(\eta) \equiv \text{id} \qquad \mu\_{A,\tau\_1+\tau\_2,\tau\_3} \circ \mu\_{\langle \tau\_3 \rangle A,\tau\_1,\tau\_2} \equiv \mu \circ \langle \tau\_1 \rangle(\mu)$$

and <sup>p</sup>μA,τ1,τ<sup>2</sup> , μ´<sup>1</sup> A,τ1,τ<sup>2</sup> q have to be monotone in τ1, τ2, similarly to pδ, δ´<sup>1</sup>q above. In Set<sup>p</sup>**N**,ďq, we define px<sup>τ</sup> <sup>y</sup>Aqpt<sup>q</sup> def " p<sup>τ</sup> <sup>ď</sup> <sup>t</sup>q ˆ <sup>A</sup>p<sup>t</sup> ´<sup>9</sup> <sup>τ</sup> <sup>q</sup>, as past <sup>A</sup>-values, with

the *side-condition* τ ď t crucial for the existence of the adjunctions xτ y%rτ s we require below. We define <sup>p</sup>ηA, η´<sup>1</sup> <sup>A</sup> , μA, μ´<sup>1</sup> <sup>A</sup> q similarly to earlier, as identities on <sup>A</sup>-values, combined with the laws of <sup>p</sup>0, `, ´q<sup>9</sup> , so as to satisfy the side-conditions.

With this, we can interpret *contexts* Γ as *functors* rrΓss : **C** Ñ **C**, given by:

$$\|\lnot\|A \stackrel{\text{def}}{=} A \qquad \|\lnot T, x: X\|\\A \stackrel{\text{def}}{=} \|\lnot\| \|A \times \|\lnot\|\\\|\ell' \langle \tau \rangle\| \|A \stackrel{\text{def}}{=} \langle \tau \rangle (\|\!\|I\| \|A)$$

We interpret contexts as functors to easily manipulate denotations of composite contexts, e.g., we then have ιΓ;Γ1;<sup>A</sup> : rrΓ, Γ<sup>1</sup> ssA – Ñ rrΓ<sup>1</sup> ssprrΓssAq, natural in A.

Finally, to formulate the semantics of computation types and terms, we require there to be a family of *adjunctions* xτ y%rτ s, i.e., natural transformations η% A,τ : A Ñ rτ spxτ yAq (the *unit*) and ε% A,τ : xτ yprτ sAq Ñ A (the *counit*), for all τ P **N**, satisfying time-graded variants of standard adjunction laws [45], given by

$$\iota\_{\langle \tau \rangle A, \tau}^{\dashv} \circ \langle \tau \rangle (\eta\_{A, \tau}^{\dashv}) \equiv \mathsf{id} \qquad [\tau] (\varepsilon\_{A, \tau}^{\dashv}) \circ \eta\_{\{\tau\} A, \tau}^{\dashv} \equiv \mathsf{id}$$

We also require pη%, ε%q to interact well with the strong monoidal structures:

$$\begin{aligned} \left[\tau\right](\left(0\leqslant\tau\right))\circ\eta\_{A,\tau}^{\dashv}\circ\eta^{-1}\circ\varepsilon &= \left[0\leqslant\tau\right] \quad & \left[\tau\_1\right](\left[\tau\_2\right](\mu))\circ\left[\tau\_1\right](\eta\_{\left<\tau\_1\right>A,\tau\_2}^{\dashv})\circ\eta\_{A,\tau\_1}^{\dashv} = \delta\circ\eta^{-1} \\ \left<0\right>(\left[0\leqslant\tau\right])\circ\eta\circ\varepsilon^{-1}\circ\varepsilon\_{A,\tau}^{\dashv} &= \left<0\leqslant\tau\right>\quad\varepsilon\_{A,\tau\_1}^{\dashv}\circ\left<\tau\_1\right](\varepsilon\_{\left[\tau\_1\right]A,\tau\_2}^{\dashv})\circ\left<\tau\_1\right>(\leftlangle\tau\_2\right>\delta) = \varepsilon^{\dashv}\circ\mu \end{aligned}$$

Proposition 1. *It then follows that* η% A,<sup>0</sup> " <sup>ε</sup>´<sup>1</sup> <sup>x</sup>0y<sup>A</sup> ˝ <sup>η</sup><sup>A</sup> *and* <sup>ε</sup>% A,<sup>0</sup> " <sup>ε</sup><sup>A</sup> ˝ <sup>η</sup>´<sup>1</sup> <sup>r</sup>0s<sup>A</sup>*.*

In Set<sup>p</sup>**N**,ďq, η% A,τ and ε% A,τ are given by identities on A-values, respectively combined with <sup>τ</sup> <sup>ď</sup> <sup>t</sup> ` <sup>τ</sup> and monotonicity for <sup>p</sup><sup>t</sup> ´<sup>9</sup> <sup>τ</sup> q ` <sup>τ</sup> " <sup>t</sup>. For the latter, we crucially know τ ď t due to the side-condition included in the definition of x´y.

We note that modulo the time grades τ , the above structure is analogous to the models of the Fitch-style presentation of S4 [19], where ˝ is modelled by an idempotent comonad, by an idempotent monad, and boxing/unboxing by -% ˝. This is also why we present r´s and x´y as comonad- and monad-like.

Computation Types For computation types, we require a r´s*-strong graded monad* pT,η<sup>T</sup> , μ<sup>T</sup> ,str<sup>T</sup> q on **C**, with grades in **N**. <sup>8</sup> In detail, this means a functor T : **N** Ñ r**C**, **C**s, together with natural transformations η<sup>T</sup> <sup>A</sup> : A Ñ T 0 A (the *unit*), μ<sup>T</sup> A,τ1,τ<sup>2</sup> : T τ1pT τ<sup>2</sup> Aq Ñ T pτ<sup>1</sup> ` τ2q A (the *multiplication*), and str<sup>T</sup> A,B,τ : rτ sAˆTBτ Ñ T pAˆBq τ (the *strength*), with the first two satisfying standard graded monad laws (see [35] or pη, μq of x´y). Below we only present the laws for str<sup>T</sup> because it has a novel temporal aspect to it—its first argument appears under rτ s. As such, str<sup>T</sup> expresses that if we know an A-value will be available after τ time units, we can push it into computations taking τ -time to execute.

We say that T is a r´s-strong graded monad following the parlance of Bierman and de Paiva [12]—in their work they model the possibility modality ˛<sup>A</sup> as <sup>a</sup> ˝-strong monad. While the laws governing str<sup>T</sup> are not overly different from standard graded strength laws [35], we have to correctly account for r´s in them

$$\begin{aligned} \mathsf{str}\_{A,B,0}^T \circ (\varepsilon\_A^{-1} \times \eta\_A^T) &\equiv \eta\_{A \times B}^T \quad \mu\_{A \times B, \tau\_1, \tau\_2}^T \circ T \, (\mathsf{str}^T) \circ \mathsf{str}^T \equiv \mathsf{str}^T \circ (\delta^{-1} \times \mu^T) \\ T \, (\mathsf{nsd}) \circ \mathsf{str}\_{A,B,\tau}^T &\equiv \mathsf{nsd} \quad T \, (\alpha) \circ \mathsf{str}^T \circ (\mathsf{m} \times \mathsf{id}) \circ \alpha^{-1} \equiv \mathsf{str}\_{A,B \times C,\tau}^T \circ (\mathsf{id} \times \mathsf{str}^T) \end{aligned}$$

where αA,B,C : pAˆBqˆC – Ñ AˆpBˆCq, and mA,B,τ : rτ sAˆrτ sB Ñ rτ spAˆBq witnesses that rτ s is monoidal for ˆ, which follows from rτ s being a right adjoint [45]. Observe that it is the r´s-strength that naturally gives T a temporal flavour—the rest of it is standard [35]. Below we show that str<sup>T</sup> is also mathematically natural, admitting an analogous characterisation to ordinary strength.

<sup>8</sup> As λrτ<sup>s</sup> does not include sub-effecting (see §6.2), a discretely graded monad T suffices.

Proposition 2. *Analogously to ordinary strong and enriched monads [39],* T *having* r´s*-strength is equivalent to* r´s-enrichment *of* T*, given by morphisms* rτ spA ñ BqÑpTτA ñ TτBq *respecting* **C***'s self-enrichment [38] and* pη<sup>T</sup> , μ<sup>T</sup> q*.*

In order to model operations op and delay in §4.2, we require T to be equipped with *algebraic operations*: we ask there to be families of natural transformations op<sup>T</sup> A,τ : rrAopss<sup>g</sup> ˆ rτopsprrBopss<sup>g</sup> ñ TτAq Ñ T pτop ` τ q A, for all op : Aop - <sup>B</sup>op ! <sup>τ</sup>op <sup>P</sup> <sup>O</sup>, and delay<sup>T</sup> A,τ<sup>1</sup> τ : rτ spT τ <sup>1</sup> Aq Ñ T pτ `τ <sup>1</sup> q A, for all τ P **N**, satisfying algebraicity laws [61], which state that both commute with μ<sup>T</sup> and str<sup>T</sup> , e.g.,

$$\mathsf{str}\_{A,B,\tau+\tau'}^T \circ (\mathsf{id} \times \mathsf{del} \mathsf{ay}^T \tau) \equiv \mathsf{del} \mathsf{ay}\_{A \times B,\tau'}^T \tau \circ [\tau](\mathsf{str}^T) \circ \mathsf{m} \circ (\delta\_{A,\tau,\tau'} \times \mathsf{id})$$

In Set<sup>p</sup>**N**,ďq, we can define T as the initial algebra of a corresponding signature functor for operations op and delay, analogously to the usual treatment of algebraic effects [8]. Concretely, such T is determined inductively by three cases

$$\frac{a \in A(t)}{\text{ret}\,a \in (T \, 0 \, A)(t)} \quad \frac{k \in (\left[\tau\_{\text{op}}\right] \big(\left[B\_{\text{op}}\right] \big)^{g} \Rightarrow T \,\tau \,A\big))(t)}{\text{op}\,a \,k \in (T \, \left(\tau\_{\text{op}} + \tau\right)A)(t)} \quad \frac{k \in \left[\tau\right](T \, \tau' A)(t)}{\text{del}\,\text{ay} \,\tau \,k \in (T \, \left(\tau + \tau'\right)A\right)(t)}$$

with <sup>p</sup>η<sup>T</sup> , μ<sup>T</sup> ,str<sup>T</sup> , op<sup>T</sup> , delay<sup>T</sup> <sup>q</sup> defined in the expected way, e.g., str<sup>T</sup> is given by recursively traversing a computation of type TτB and moving the argument of type rτ sA under ret cases, modifying τ when going under the op and delay cases.

#### 4.2 Interpretation of Value and Computation Terms

The interpretation of values and computations is defined simultaneously. We only present the temporally interesting cases—full details are in the online appendix.

As λrτ<sup>s</sup> does not have sub-effecting and includes enough type annotations for typing derivations to be unique, this interpretation is *coherent* by construction.

Values We assume a morphism rrfss : rrA1ss<sup>g</sup> ˆ ... ˆ rrAnss<sup>g</sup> Ñ rrBss<sup>g</sup> for every f : pA1,...,Anq Ñ B. We interpret a well-typed value Γ \$ V : X as a morphism rrΓ \$ V : Xss : rrΓss**1** Ñ rrXss in **C** by induction on the given typing derivation.

Most of the value cases are standard, and analogous to other calculi based on fine-grain call-by-value [44] and graded monads [35], using the Cartesian-closed structure of **C**. The temporally interesting cases are Var and Box, given by

$$\begin{aligned} \left\| \left[ \Gamma, x:X, I' \vdash x:X \right] \right\| \stackrel{\text{def}}{=} \left\| \left[ \Gamma, x:X, I' \right] \right\| \mathbb{1} \stackrel{\iota}{\longrightarrow} \left\| \left[ I' \right] \right\| \mathbb{1} \times \left\| \left[ X \right] \right\| \\ \stackrel{\stackrel{\bullet}{\longrightarrow}}{\longrightarrow} \left\langle \text{time } I' \right\rangle \Big( \left\| I' \right\| \mathbb{1} \times \left\| X \right\| \Big) \stackrel{\iota^{\triangleleft}}{\longrightarrow} \left\| \left[ I' \right] \mathbb{1} \times \left\| X \right\| \stackrel{\text{snd}}{\longrightarrow} \left\| X \right\| \Big) \end{aligned}$$

rr<sup>Γ</sup> \$ box<sup>τ</sup> <sup>V</sup> : <sup>r</sup><sup>τ</sup> <sup>s</sup> <sup>X</sup>ss def " rrΓss**<sup>1</sup>** <sup>η</sup>% ´ÝÑ rτ s ` xτ yprrΓss**1**q ˘ <sup>r</sup>τsprr<sup>V</sup> ssq ´´´ÝÑ rτ srrXss

where eA,Γ : rrΓssA Ñ xtime ΓyA extracts and collapses all temporal modalities in Γ, and the counit-like ε xy A,τ is given by the composite <sup>x</sup><sup>τ</sup> <sup>y</sup><sup>A</sup> <sup>x</sup>0ďτy<sup>A</sup> ´´´Ñ x0yA η´<sup>1</sup> A ´Ñ A. Computations We interpret a well-typed computation Γ \$ M : X ! τ as a morphism rrΓ \$ M : X ! τ ss : rrΓss**1** Ñ T τ rrXss in **C** by induction on the typing derivation. The definition is largely unsurprising and follows a pattern similar to [35,44]—the novelty lies in controlling the occurrences of x´y and r´s.

In Let, we use <sup>x</sup><sup>τ</sup> y%r<sup>τ</sup> <sup>s</sup> to push the environment "into the future", and then follow the standard monadic strength-followed-by-multiplication pattern [35,52]:

rrΓ \$ let x " M in N : Y ! τ ` τ <sup>1</sup> ss def " rrΓss**<sup>1</sup>** xη%,rrMssy ´´´´ÝÑ rτ s ` xτ yprrΓss**1**q ˘ ˆ T τ rrXss str<sup>T</sup> ´ÝÑ T τ ` <sup>x</sup><sup>τ</sup> yprrΓss**1**q ˆ rrXss˘ <sup>T</sup> prrNssq ´´´´ÝÑ T τ <sup>p</sup>T τ <sup>1</sup> rr<sup>Y</sup> ssq <sup>μ</sup><sup>T</sup> ´ÝÑ T pτ ` τ <sup>1</sup> q rrY ss

An analogous use of xτ y%rτ s also appears in the cases for operations, e.g., in

$$\begin{array}{c} \left\| \begin{array}{l} \Gamma \vdash \mathsf{op} \, V \; (x . M) : X \ \mathsf{I} \; \tau\_{\mathsf{op}} + \tau \end{array} \right\| \stackrel{\mathsf{def}}{=} \left\| \begin{array}{l} \Pi \end{array} \right\| \mathbb{1} \xrightarrow{\langle \Pi \| \mathsf{I} \; \eta^{-} \rangle} \left\| \begin{array}{l} A\_{\mathsf{op}} \right\| \mathbb{g}^{g} \times \left[ \tau\_{\mathsf{op}} \right] \left( \langle \tau\_{\mathsf{op}} \rangle (\| \Gamma \| \mathsf{I} \rangle) \right) \end{array} \right\| \end{array}$$

$$\stackrel{\mathsf{id} \times \left\| \tau\_{\mathsf{op}} \right\| \left( \mathsf{curl} \, \eta \right) \left\| \begin{array}{l} \Pi \mathsf{a} \mathsf{p} \end{array} \right\| \right\| ^{g} \times \left\| \tau\_{\mathsf{op}} \right\| \left( \left\| \mathsf{I} \; \mathsf{a} \mathsf{p} \right\| \right) \stackrel{\mathsf{op}^{T}}{\Longrightarrow} T \, \tau \left\| \begin{array}{l} X \end{array} \right\| \stackrel{\mathsf{op}^{T}}{\longrightarrow} T \, \left( \tau\_{\mathsf{op}} + \tau \right) \left\| X \right\| \, \mathsf{I} \; \eta \end{array}$$

Next, the Unbox case of the interpretation is defined as

$$\begin{aligned} \left\| I \vdash \text{unbox}\_{\tau} \; V \; \text{as } x \text{ in } N: Y \; ! \; \tau' \right\| & \stackrel{\text{def}}{=} \left\| I \right\| \mathbb{1} \xrightarrow{\langle \mathbf{d}, \eta^{\text{PAR}} \rangle} \left\| I \right\| \mathbb{1} \times \left\langle \tau \right\rangle \left( \left\| I \; \tau - \tau \right\| \mathbb{1} \right) \\ & \xrightarrow{\left\| \mathbf{d} \times \langle \tau \rangle (\left\| I \right\| \mathbb{1}) \right\|} \left\| I \right\| \mathbb{1} \times \left\langle \tau \right\rangle (\left\| \left\| I \right\| \mathbb{1}) \xrightarrow{\left\| \mathbf{d} \times \mathbf{c}^{-i} \right\|} \left\| I \right\| \mathbb{1} \times \left\| \left\| X \right\| \xrightarrow{\left\| N \right\|} T \; \tau' \left\| \left\| Y \right\| \end{aligned} $$

showing that temporal resources follow the common pattern in which elimination forms are modelled by counits of adjunctions, whereas units model introduction forms (akin to functions). The morphism ηPRA Γ,A,τ : rrΓssA Ñ xτ yprrΓ ´ τ ssAq extracts and collapses τ worth of context modalities in Γ, as long as τ ď time Γ. It is a semantic counterpart to an observation that the context modality Γ, xτ y is a *parametric right adjoint* to the Γ ´ τ operation, as in recent dependently typed presentations of Fitch-style modal types [27], see §6.1 for further discussion.

Finally, we discuss the interpretation of effect handling. For this, we additionally require **C** to have *set-indexed products* Π<sup>i</sup>P<sup>I</sup>A<sup>i</sup> and *handling morphisms*

$$\begin{aligned} \left( \chi\_{A,\tau,\tau'} : \Pi\_{\mathsf{op}\in\mathsf{C}} \Pi\_{\tau'' \in \mathbb{N}} \left( (\left\| A\_{\mathsf{op}} \right\| ^g \times \left\{ \tau\_{\mathsf{op}} \right\} (\left\| B\_{\mathsf{op}} \right\| ^g \Rightarrow T \,\tau'' \, A) \right) & \Rightarrow T \left( \tau\_{\mathsf{op}} + \tau'' \right) \, A \right) \\ & \qquad \to T \,\tau \left( T \,\tau' \, A \right) \Rightarrow T \left( \tau + \tau' \right) \, A \end{aligned}$$

satisfying laws which state that χ<sup>A</sup> returns a graded T-algebra [22,50], e.g., we require uncurrypχA,0,τ1q˝pid ˆ η<sup>T</sup> q " snd, where uncurry (and curry earlier) is part of the universal property of A ñ B. We also require similar laws for χ's interaction with op<sup>T</sup> and delay<sup>T</sup> . In Set<sup>p</sup>**N**,ďq, χ is defined by recursively traversing a given tree, replacing all occurrences of op a k with respective operation clauses.

Writing <sup>H</sup> for the domain of <sup>χ</sup>rr<sup>Y</sup> ss,τ,τ<sup>1</sup> , the Handle case is then defined as

rrΓ \$ handle M with H to x in N : Y ! τ ` τ <sup>1</sup> ss def "

$$\begin{split} \llbracket \llbracket I' \rrbracket \rrbracket & \xrightarrow{\langle \mathrm{id}, \langle \gamma^{\top}, \llbracket \mathrm{I} \rrbracket \rrbracket \rangle \rangle} & \llbracket I' \rrbracket \rrbracket \rrbracket \, \mathbbm{1} \times \left( [\tau] \big( (\tau) (\llbracket I' \rrbracket \rrbracket \rrbracket) \rangle \times T \tau \llbracket \rrbracket X \rrbracket \right) \\ & \xrightarrow{\mathrm{id} \times \mathrm{str}^{T}} \llbracket I' \rrbracket \rrbracket \, \mathbbm{1} \times T \tau \left( (\tau) (\llbracket I' \rrbracket \rrbracket \rrbracket) \times \llbracket X \rrbracket \right) \xrightarrow{\mathrm{id} \times T \tau \left( \llbracket I' \rrbracket \rrbracket \rrbracket \right)} & \llbracket I' \rrbracket \rrbracket \, \mathbbm{1} \times T \tau \left( T \tau' \llbracket \rrbracket Y \rrbracket \right) \\ & \xrightarrow{\llbracket \rrbracket^{H} \rrbracket \rrbracket \, \mathbbm{1} \times \mathrm{id}} \mathcal{H} \times T \tau \left( T \tau' \llbracket \rrbracket Y \rrbracket \right) \xrightarrow{\mathrm{non carry} \langle \chi\_{\llbracket \mathrm{I} \rrbracket \rrbracket, \tau, \tau' \rrbracket}} T \left( \tau + \tau' \right) \llbracket Y \rrbracket \end{split}$$

where we write rrHss for the point-wise interpretation of operation clauses

$$\{\llbracket\Gamma\rrbracket\rrbracket \: \xrightarrow{\langle\langle\mathrm{id}\rangle\_{\tau''\in\mathbb{N}}\rangle\_{\mathrm{op}\in\mathcal{O}}} \Pi\_{\mathrm{op}\in\mathcal{O}}\Pi\_{\tau''\in\mathbb{N}}\Big(\llbracket\Gamma\rrbracket\Vert\mathbb{1}\Big{)\xrightarrow{\Pi\_{\mathrm{op}\in\mathcal{O}}\,\Pi\_{\tau''\in\mathbb{N}}\Big{(}\mathrm{curry}\,\{\llbracket\mathrm{M\_{\mathrm{op}}\,\tau''\}\|\diamond\alpha^{-1}\}\Big{)}}\mathcal{H}$$

### 4.3 Renamings, Substitutions, and Soundness

We now show how syntactic renamings and substitutions relate to semantic morphism composition, using which we then prove the interpretation to be *sound*.

Proposition 3. *Given* ρ : Γ - Γ<sup>1</sup> *and* Γ \$ J*, then* rrJrρsss " rrJss ˝ rrρss**1***, where the interpretation of renamings* rrρss<sup>A</sup> : rrΓ<sup>1</sup> ssA Ñ rrΓssA *is defined by induction on the derivation of* ρ : Γ - Γ<sup>1</sup> *, with the morphism* rrρss<sup>A</sup> *also natural in* A*.*

Proposition 4. *Given* Γ, x : X, Γ<sup>1</sup> \$ J *and* Γ \$ W : X*, we have* rrJrW{xsss " rrJss ˝ ι ´1 Γ,x:X;Γ1;**<sup>1</sup>** ˝ rrΓ<sup>1</sup> ss` xid,rrWssy˘ ˝ ιΓ;Γ1;**1***, where* pι, ι´1q *are discussed in §4.1.*

*Proof.* We prove both results by induction on the derivation of Γ \$ J. The proofs are unsurprising but require us to prove auxiliary lemmas about recursively defined renamings and semantic morphisms. For example, for Prop. 3, we show ηPRA ˝rrρss " xτ yprrρ´τ ssq˝ηPRA : rrΓ<sup>1</sup> ssA Ñ xτ yprrΓ ´τ ssAq, and for Prop. 4, that ηPRA ˝ ι " xτ y ` ι ˘ ˝ ηPRA : rrΓ, Γ<sup>1</sup> ssA Ñ xτ y ` rrΓ<sup>1</sup> ´ τ ssprrΓssAq ˘ , when τ ď time Γ<sup>1</sup> .

Theorem 3. *Given* Γ \$ I " J *derived using the rules in §3.5, then* rrIss " rrJss*.*

*Proof.* The proof proceeds by induction on the derivation of Γ \$ I " J, using Prop. 3 and Prop. 4 to unfold the renamings and substitutions in the equations of §3.5, and using the properties of the abstract structure we required **C** to have.

## 5 Quotienting Delays

Observe that in λrτ<sup>s</sup> the computations delay τ pdelay τ <sup>1</sup> Mq and delay pτ ` τ <sup>1</sup> q M cannot be proved equivalent, though in some situations this might be desired.

In order to deem the above two programs (and others alike) equivalent, we extend λrτs's equational theory with the following natural equations for delays:

$$\text{delay } 0 \, M \equiv M \qquad \text{delay } \tau \text{ (delay } \tau' \, M) \equiv \text{delay } (\tau + \tau') \, M$$

Theorem 4. *If the algebraic operations* delay<sup>T</sup> *of* T *satisfy analogous two equations, the interpretation of §4 is sound for this extended equational theory.*

For the concrete model in Set<sup>p</sup>**N**,ďq, we have to *quotient* T [36] by these two equations—the resulting graded monad is determined inductively by the cases

$$\frac{k \in (S \, \tau \, A)(t)}{\mathtt{comp} \, k \in (T \, \tau \, A)(t)} \quad \frac{\tau > 0 \qquad k \in [\tau](S \, \tau' \, A)(t)}{\mathtt{delay} \, \tau \, k \in (T \, (\tau + \tau') \, A)(t)}$$

$$\frac{a \in A(t)}{\mathtt{rect} \, a \in (S \, 0 \, A)(t)} \quad \frac{a \in [\|A\_{\textsf{op}}\|]^g(t)}{\mathtt{copy} \, a \in (S \, (\tau\_{\textsf{op}} + \tau) \, A)(t)} \quad \mathtt{if } a \, b \in (S \, (\tau\_{\textsf{op}} + \tau) \, A)(t)$$

where pTτAqptq and pSτAqptq are defined simultaneously in such a way that only non-zero, non-consecutive delays can appear in the tree structure.

## 6 Related and Future Work

#### 6.1 Related Work

We contribute to two prominent areas: (i) modal types and (ii) graded monads.

As noted in §2.1, *modal types* provide a mathematically natural means for capturing many aspects of programming. Adding to §2.1, types corresponding to the *eventually* and *always modalities* of temporal logics capture *functional reactive programming (FRP)* [18,32,42], including a combination with linearity and time-annotations to model resources [33], where *all* values are annotated with inhabitation times. Recently, FRP has also been studied in Fitch-style [6]. Starting with Nakano [55], modal types have also been used for *guarded recursion*, even in the dependently typed setting [5,14,47], including in Fitch-style [13].

We also note that λrτs's time grades τ and the Γ ´ τ operation are closely related to recent dependently typed Fitch-style frameworks. Namely, [28] develops a *multimodal type theory* (MTT) where types rμsX are indexed by 1-cells μ of a strict 2-category (a mode theory). The time grades τ of λrτ<sup>s</sup> are an example of such mode theories, given by the delooping of **N**, i.e., by a single 0-cell, τ s as 1-cells, and τ ď τ <sup>1</sup> s as 2-cells. While ensuring the admissibility of and naturality under substitutions, MTT with its indirect elimination rule for rμsX is weaker than earlier systems (such as [13]). The direct-style elimination rule is recovered in [27] by observing that in addition to Γ, xμy being a left adjoint to rμsX, it should further form a *parametric right adjoint (PRA)* [17,71] to contexts of the form Γ{pr : μq, where r is a substitution ¨, xμy - Γ. The operation Γ ´ τ in λrτ<sup>s</sup> is an instance of this: <sup>μ</sup> is a <sup>τ</sup> , <sup>r</sup> corresponds to the condition <sup>τ</sup> <sup>ď</sup> time <sup>Γ</sup> in Unbox, contexts <sup>Γ</sup>{p<sup>r</sup> : <sup>μ</sup><sup>q</sup> are given by <sup>Γ</sup> ´ <sup>τ</sup> , and the PRA situation is witnessed by renamings ppΓ ´ τ q, xτ yq - Γ, when τ ď time Γ, and Γ ppΓ, xτ yq ´ τ q.

*Graded monads* provide a uniform framework for different effect systems and effect-based analyses [22,35,36,50,51]. A major contribution of ours is showing that context modalities can inform continuations of preceding computations' effects. While the theory of graded monads can be instantiated with any ordered monoid, we focus on natural numbers to model time, but do not expect complications generalising <sup>λ</sup>rτ<sup>s</sup> to other structures with same properties as <sup>p</sup>**N**, <sup>0</sup>, `, ´<sup>9</sup> , ďq, and perhaps even to grading T and x´y, r´s with different structures, akin to [23].

Our use of rτ s X to restrict when resources are available is somewhat reminiscent of *coeffects* [16,24,57,58] and *quantitative type systems* [4,49,53]. In these works, variables are graded by (semi)ring-valued rs, as x :<sup>r</sup> X, counting how many times and in which ways x is used, enabling applications such as liveness and dataflow analyses [57]. Semantically, these systems often interpret x :<sup>r</sup> X using a graded comonad, as ˝rX, where one can access <sup>X</sup> only if <sup>r</sup> " <sup>1</sup>. Of such works, the closest to ours is that of Gaboardi et al. [23], who combine coeffects with effectful programs via distributive laws between the grades of coeffects and effects, allowing coeffectful analyses to be propagated through effectful computations.

We also note that the type rτ s X can be intuitively also viewed as a temporallygraded variant of *promise types* [29,65], in that it expresses that a value of type X will be available in the future, but with additional time guarantees.

#### 6.2 Future Work

Currently, λrτ<sup>s</sup> does not support *sub-effecting*: we cannot deduce from τ ď τ <sup>1</sup> and Γ \$ M : X ! τ that Γ \$ M : X ! τ <sup>1</sup> . Of course, we can simulate this by inserting τ 1 ´<sup>9</sup> <sup>τ</sup> worth of explicit delays into <sup>M</sup>, but this is extremely intensional, fixing where delays happen. In particular, we cannot type equations such as let x " preturn V q in N " NrV {xs if return V was sub-effected to τ ą 0, with the xτ y in N's context the culprit. However, when considering sub-effecting as a *coercion* coerceτďτ<sup>1</sup> M, we believe we can add it by considering equations stating that it will produce *all the possible ways* how τ <sup>1</sup> ´<sup>9</sup> <sup>τ</sup> worth of delays could be inserted into M. Of course, this will require a more complex non-deterministic semantics.

It would be neat if λrτ<sup>s</sup> also included *recursion* in a way that programs could make use of the temporal discipline. This is likely unattainable for general recursion, but we hope that *primitive recursion* (say, on natural numbers) can be added via *type-dependency* of time grades τ on the values being recursed on.

It would be interesting to combine λrτ<sup>s</sup> with linear [25] and separation logics [34,64] to model *linear* and *spatial properties* of temporal resources. Another goal would be to add *concurrency*, e.g., using (multi)handlers [9,20,21]. We also plan to look into capturing *expiring* and *available-for-an-interval* style resources.

Further, we plan to study λrτs's *operational semantics*, namely, one that takes time seriously and does not model delays simply as uninterpreted operations [9], together with developing a *prototype*, and proving *normalisation* akin to [26,69].

We also plan to study the *completeness* of the denotational semantics of λrτs. For such semantic investigations, it could be beneficial to also study the general theory of the kinds of temporally aware graded algebraic effects used in this paper, by investigating their *algebras* and *equational presentations* [36,50].

## 7 Conclusion

We have shown how a temporal, time-graded variant of Fitch-style modal type systems, when combined with an effect system based on graded monads, provides a natural framework for safe programming with temporal resources. To this end, we developed a modally typed, effectful, equationally-presented core calculus, and equipped it with a sound denotational semantics based on strong monoidal functors (for modelling modalities) and graded monads (for modelling effects). The calculus also includes temporally aware graded algebraic effects and effect handlers, with the continuations of the former knowing that an operation's worth of additional time has passed before they start executing, and where the userdefined effect handlers are guaranteed to respect this temporal discipline.

*Acknowledgements* We thank Andrej Bauer, Juhan-Peep Ernits, Niccolò Veltri, and Niels Voorneveld for useful discussions. We also thank one of the reviewers for drawing our attention to the recent work on presenting Fitch-style modal types in terms of parametric right adjoints, and its relationship to the work presented in this paper. This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-21-1-0024.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Deciding Contextual Equivalence of** ν**-Calculus with Effectful Contexts**

Daniel Hirschkoff<sup>1</sup>, Guilhem Jaber2(), and Enguerrand Prebet<sup>1</sup>

<sup>1</sup> Universit´e de Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRIA, LIP, France <sup>2</sup> Nantes Universit´e, LS2N, Inria, France guilhem.jaber@univ-nantes.fr

**Abstract.** We prove decidability for contextual equivalence of the λμνcalculus, that is the simply-typed call-by-value λμ-calculus equipped with booleans and fresh name creation, with contexts taken in λμref, that is λμν-calculus extended with higher-order references.

The proof exploits a labelled transition system capturing the interactions between λμν programs and λμref contexts. The induced bisimulation equivalence is characterized as equality of certain trees, inspired by the work of Lassen. Since these trees are computable and finite, decidability follows. Bisimulation coincides also with trace equivalence, which in turn coincides with contextual equivalence .

## **1 Introduction**

Dynamic allocation is central to many programming constructions. Many languages provide built-in support for dynamically-allocated resources, for example, objects in Java or references in ML. The creation of these resources is *local*, meaning that resources can be accessed only within their scope. They can also be passed around via function applications, in which case their scope is not static but evolves dynamically. When building semantics for such languages, one represents dynamic allocation as the creation of fresh locations, that can be seen as atoms or names.

In this paper, we study a paradigmatic language with dynamic allocation, namely the ν-calculus, a simply-typed call-by-value λ-calculus with fresh atom creation and equality test of atoms, as introduced by Pitts and Stark in [24]. For instance, the ν-calculus program new *n* in λ*x*.(*x* = *n*) allocates a new atom *n*, receives an atom *x* and returns the result of the comparison between *x* and *n*.

A central question while studying this language is to determine when two programs can be considered to be equivalent. The most studied approach to express behavioral equivalence between programs is contextual equivalence. Intuitively, two programs are deemed equivalent if and only if whenever they are run as part of an enclosing program called the *context*, it is not possible to distinguish one from the other. For instance, because the context has no way to guess the atom *n*, we expect the program above to be equivalent to λ*x*.false.

Reasoning on contextual equivalence for the ν-calculus has shown to be challenging, due to the interplay between the higher-order control flow and the scope

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. https://doi.org/10.1007/978-3-031-30829-1 2 24–45, 2023.

extrusion of atoms. A variety of frameworks has been introduced to do so, based on logical relations [24], environmental bisimulations [5], and game semantics [1].

However, the question of whether this equivalence is *decidable* remains open since the introduction of this language 30 years ago.

In this paper, we address this question by working in an asymmetric setting, giving contexts more discriminating power than just the mere creation of atoms. Indeed, contextual equivalence depends on two languages: the language for programs, and the language for contexts interacting with these programs. We take contexts in the λμref-calculus, an extension of the ν-calculus with both higherorder references and continuations. In this setting, atoms are simply references where only the unit value can be stored. Contextual equivalence is then coarser than for the symmetric setting when the contexts are also taken in the ν-calculus. For example, one of the standard examples of equivalence of the literature

$$\text{Then } n \text{ in new } n' \text{ in } \lambda f. (f \ n = f \ n') \approx\_{\text{ctx}} \lambda f. \text{true}$$

is not an equivalence anymore, since a λμref context can provide a function that stores its argument in a reference and use it to discriminate these programs.

The main result we establish in this paper is the decidability of contextual equivalence for terms of ν-calculus with contexts in the λμref-calculus. More generally, we establish this result for terms of the λμν-calculus, which corresponds to terms of the λμref-calculus that only use references storing the unit value.

To establish this result, we provide a B¨ohm-like tree representation [6,3] for the terms of the λμν-calculus. Being in call-by-value, equality of such trees coincides with Lassen's eager normal form bisimulations [16]. Moreover, since programs in the λμν-calculus are terminating, these trees, which we call *Lassen trees*, are finite. It is thus straightforward to check their equality. Then, we prove that Lassen trees equality is fully-abstract, that is it coincides with contextual equivalence with contexts in the λμref-calculus.

Proving this full-abstraction result is done through the introduction of an *operational game semantics* (OGS) for λμref by defining a Labelled Transition System (LTS) that distinguishes between internal operations, Proponent moves (originating in the program) and Opponent moves (originating in the context). Trace equivalence based on these labelled transitions is shown to coincide with the contextual equivalence of λμref.

The OGS also gives rise to a notion of *bipartite bisimulation*, describing a game between Proponent (the program in λμref) and Opponent (a context in λμref). Proponent reduces the program until it reaches a normal form, that triggers an interaction with the context. Along the game, knowledge is accumulated in configurations. When it is Opponent's turn to play, it chooses between answering a previous function call from Proponent, or generating a new function call, to which Proponent shall answer. Among this knowledge, we accumulate the atoms that have been disclosed by the two players, so that Opponent cannot use an atom private to Proponent.

The OGS LTS generates infinite trees since Opponent can interrogate an arbitrary number of times each value provided by Proponent. The Lassen trees used to decide contextual equivalence are generated using a *linearized* variant of the OGS LTS, called the *Prime Operational Game Semantics* (POGS) LTS. The POGS LTS enforces that Opponent interrogates only once each value provided by Proponent. For this linearization to be sound, one has to guess the disclosed status of atoms as soon as they are created. This can be illustrated by considering the following example of inequivalence

#### new *n* in λ*x*.*n* ctx λ*x*.new *n* in *n*.

Opponent must be able to interrogate at least twice each of these two programs to discriminate them. The first program would then return the same atom at each call, while the second program would return two different atoms. The Lassen tree of the first program would declare *n* to be disclosed when giving back the control to Opponent by providing the λ-abstraction, but this could not be matched by the second program, since *n* would not exist yet at that point of the interaction.

The main technical challenge at this point is to prove that this forecasting of the disclosure process is sound and complete. This is done by proving that the bipartite bisimilarities defined over the OGS LTS and the POGS LTS coincide. One direction is proven by lifting POGS bisimulations into OGS bisimulations via an *up-to* technique. The other direction is done by introducing a new *limit* construction of the disclosed set of atoms appearing in the OGS bisimulations, to transform it into a POGS bisimulation.

*Paper outline.* After introducing the λμref-calculus and the λμν-calculus in Section 2, we define the LTS for the OGS in Section 3. The induced trace equivalence coincides with contextual equivalence. We then move to Lassen trees in Section 4, and show that they yield an equivalence that coincides with bipartite bisimilarity in the OGS in Section 5. We discuss related work in Section 6, and present concluding remarks in Section 7. For lack of space, several technical developments are given in [9].

# **2 The** λμref**-calculus and the** λμν**-calculus**

The syntax of the λμref-calculus is given by the following grammar:

Values V, W *x* | () | λ*x*.M | true | false | Terms M, N - V | let *x* = M in N | VW | if V then N<sup>1</sup> else N<sup>2</sup> | V = W | new *x* = V in M | V := W |!V | μc.M | [c]M Contexts C, C- - •|[c]C | let *x* = C in M | let *x* = M in C | λ*x*.C | μc.C | if V then C else M | if V then M else C | new *x* = V in C Evaluation Contexts E, E- - [c]• | E[let *x* = • in M] Types σ, τ -Unit | Bool | σ → τ | **ref** <sup>σ</sup> | ⊥

with *x* ∈ Vars (variables), c ∈ Covars (continution variables), ∈ Locs (locations). We write supp(M) for the set of locations appearing in M, and **FV**(M) for


**Fig. 1.** λμref: typing rules for terms and evaluation contexts

the *free variables* of M. This language has two binders, the standard λ-abstraction, and the μ binder for *continuation variables* c, *d* [22].

A store, ranged over by S, T, is a finite mapping from locations to values. S() stands for the value associated to in S. We use notation S · [ → V] for the extension of S with a mapping for , which is only defined if is not defined in S. S[ → V] denotes the store S in which the value associated to is updated.

The operational semantics →op of the λμref-calculus is defined over *configurations*, which are pairs (M, S) formed by a term and a store. It is given by the following rules:


The typing system for terms is given by the rules in Figure 1. We chose here a typing judgement with a single typing context Γ, so that continuation variables are given types of the shape ¬σ. Such negated types are also used to type evaluation contexts, as specified by the two last rules in Figure 1. While we cannot store a continuation variable c in a reference, we can always store its associated function λ*x*.[c]*x*. Typing rules force terms of type ⊥ to be of the shape [*d*]M, following Parigot's original presentation of the λμ-calculus [22].

We also consider a typing judgement of the shape Σ C : (Γ; σ) - (Δ; τ), for contexts C that take terms M of type Σ; Γ M : σ and produce terms of type Σ; Δ C[M] : τ. The typing rules defining this judgement are standard and not recalled here.

In the following, we consider the λμν-calculus, the fragment of the λμrefcalculus that only handles references of type **ref** Unit. That is, for any reference type **ref** <sup>σ</sup> appearing in the typing derivation, we have σ = Unit.

We use a, b,... to range over locations of type **ref** Unit, also called atoms, and introduce the slightly shorter notation new *n* in M to stand for new *n* = () in M in λμν. The syntax for values and terms of the λμν-calculus is thus:

Values V, W *x* | () | λ*x*.N | true | false | a Terms M, N V | let *x* = M in N | VW | if V then N<sup>1</sup> else N<sup>2</sup> | V = W | new *n* in M | μc.M

In this setting, we see stores S directly as sets of atoms, all mapping to the unit value (). For <sup>L</sup> a set of atoms. we write -L for the store that maps atoms in L to the unit value ().

We consider the following extension of the typing judgement respectively to stores S and value-mapping substitutions γ:

> ∀ ∈ dom(S), Σ; S() : Σ() dom(S) = dom(Σ) S : Σ <sup>∀</sup>*<sup>x</sup>* <sup>∈</sup> dom(Γ), <sup>Σ</sup>; <sup>Δ</sup> <sup>γ</sup>(*x*) : <sup>Γ</sup>(*x*) dom(γ) <sup>=</sup> dom(Γ) Σ; Δ γ : Γ

**Definition 1.** *A* normal form (M, S) *is a configuration that is irreducible for the reduction relation* →op*. We write* (M, S) ⇓ N *when there exists a store* T *such that* (M; S) →<sup>∗</sup> op (N; T) *and that* (N; T) *is a normal form.*

We call the types Bool, Unit and **ref** <sup>σ</sup> *positive* types, while σ → τ and ¬σ are called *negative* types. By only allowing free variables of negative types, we can provide a sharp characterization of normal forms.

**Theorem 2.** *Taking a term* M *such that* Σ; Γ M : ⊥ *with* Γ *a typing context mapping variables to negative types, if* (M, S) *is in normal form with respect to* →op*, then* M *is either a named value* [c]V *or a neutral term* E[*x*V]*.*

*Moreover, for any configuration* (M, S) *such that* M *is in* λμν*,* Σ; Γ M : ⊥ *and* S : Σ*, there exists* N *such that* (M, S) ⇓ N*.*

**Definition 3.** *Taking two terms* M, N *such that* Σ; Γ M : σ *and* Σ; Γ N : σ*, we say that they are* contextually equivalent*, written* Σ; Γ M ctx N : σ*, when for all* *continuation variable* c *and context* C *such that* Σ - C : (Γ; σ) - (c : ¬Unit; ⊥)*, and for all store* S *such that* - S : Σ *, we have* (C[M], S)⇓ [c]() *if and only if* (C[N], S)⇓ [c]()*.*

In the definition above, we use λμref contexts to observe λμν terms. Such contexts can use higher-order references, and lead to divergent computations. For this reason, testing for convergence to () is enough when defining ctx.

## **3 Operational Game Semantics**

We now introduce a fully-abstract trace semantics for λμref programs. We follow a modular presentation, inspired by the one provided by Laird in [15], where the semantics is built from a synchronization product of three LTS:


#### **3.1 Abstract values**

To represent the interaction between the program and its environment, we distinguish between values that we can observe and values that we can interact with. The two players only exchange observable values, called *abstract values* in this paper. They are defined by the following grammar:

$$\mathsf{A}, \mathsf{B} \triangleq \mathsf{f} \mid \mathsf{a} \mid \mathsf{true} \mid \mathsf{false} \mid \mathsf{false} \mid ()$$

with f a *function name*, that is a variable used to represent functions exchanged between the two players. These correspond to the positive part of values, and are also called *ultimate patterns* in [17]. Like for terms, supp(A) stands for the set of atoms occurring in A. We consider the typing judgement Δ A : σ for abstract values, with σ a positive type, that is defined similarly as done for terms.

Then we introduce the abstraction relation that transforms a value V into a pair (A, γ) formed by an abstract value and a substitution, such that A{γ} = V:

f, g function names f -(g, [g → f]) () -((), ε) b ∈ {true, false} b -(b, ε) a an atom a -(a, ε) λ*x*.M -(f, [f → λ*x*.M])

### **3.2 Labelled Transition Systems**

The two players, Opponent and Proponent, exchange moves, which are in one of six forms:


We use **m** to range over moves, and **p** (resp. **o**) to range over Proponent (resp. Opponent) moves. Initial questions are the introductory moves. In contrast with other moves, they can introduce multiple abstract values in a row, which is useful to instantiate all the variables of a typing context Γ. They use a distinguished function name ?.

Traces **t** are sequences of moves. We write **m** for the corresponding move with reversed polarity (input switched to output, and vice-versa). We extend this definition to switch traces, written **t**.

The three labelled transition systems we define are instances of the following definition:

**Definition 4.** A labelled transition system (LTS) L is a triple of the form (Confs, Actions, −→). Confs is a set of configurations C, D. Actions is a set of actions **a**, formed by the moves **m**, together with a silent action op, corresponding to internal computations. Relation →⊆ − Confs×Actions×Confs is the labelled transition relation. We write C **<sup>a</sup>** −→ D for (C, **a**, D) ∈−→.

Taking C a configuration of an LTS L, we write **Tr**L(C) for the set of traces, as sequences of moves generated by this LTS over C (so with op actions removed). We write C tr D for the trace equivalence relation, which equates configurations C, D when both have the same set of traces.

#### **3.3 Interactive LTS**

We consider interactive configurations I;J ∈ IConfs which are either passive of the shape S; γ, or active of the shape M; S; γ with M a term, S a store, and γ a substitution. The Interactive LTS L<sup>I</sup> is then defined as the triple (IConfs, Actions, −→I) with relation −→<sup>I</sup> defined in Figure 2.

The two rules for Proponent moves describe transitions performed by normal forms and make use of the abstraction relation. In the two rules for Opponent, the notation <sup>S</sup> [supp -(A)] stands for S extended with a binding a → () in the case when A = a and a is fresh for Proponent, and simply S otherwise: Proponent extends its store when a new atom is received.

#### **3.4 Typing LTS**

We consider type-context configurations S, T ∈ ConfsTy which are either active of the shape Δ<sup>O</sup> | ⊥; ΔP or passive of the shape Δ<sup>O</sup> | ΔP, with ΔO, Δ<sup>P</sup> two disjoint typing contexts that map variables to negative types.

$$\mathsf{op}\,\,\frac{\left(\mathsf{N};\mathsf{S}\right)\longmapsto\_{\mathsf{op}}\left(\mathsf{N};\mathsf{T}\right)}{\left(\mathsf{N};\mathsf{S};\mathsf{y}\right)\stackrel{\mathsf{op}}{\longrightarrow}\_{\mathsf{I}}\left(\mathsf{N};\mathsf{T};\mathsf{y}\right)}$$

$$\text{PO}\xrightarrow[\underbrace{\langle\mathsf{E}[\mathsf{f}\mathsf{V}];\mathsf{S};\gamma\rangle\\_{\x}\stackrel{\mathsf{F}(\mathsf{A};\mathsf{c})}{\longrightarrow}\!\rangle\langle\mathsf{S};\gamma\cdot\gamma'\cdot[\mathsf{c}\mapsto\mathsf{E}]\rangle}\_{\text{I}}\qquad\qquad\qquad\qquad\dfrac{\mathsf{V}\not\supset\!\langle\mathsf{A};\gamma'\rangle}{\langle[\mathsf{c}]\mathsf{V};\mathsf{S};\gamma\rangle\\_{\x}\stackrel{\mathsf{E}(\mathsf{A})}{\longrightarrow}\!\rangle\langle\mathsf{S};\gamma\cdot\gamma'\rangle}\;\mathsf{PA}$$

**Fig. 2.** Definition of <sup>L</sup>I, the Interactive LTS: transitions of interactive configurations

PQ ΔO(f) = σ → τ Δ - A : σ Δ<sup>O</sup> | ⊥; ΔP <sup>f</sup>(A,c) −−−−→Ty Δ<sup>O</sup> <sup>|</sup> <sup>Δ</sup>P, <sup>Δ</sup>, <sup>c</sup> : <sup>¬</sup>τ ΔO(c) = ¬σ Δ - A : σ Δ<sup>O</sup> | ⊥; <sup>Δ</sup>P <sup>c</sup>(A) −−−→Ty Δ<sup>O</sup> <sup>|</sup> <sup>Δ</sup>P, <sup>Δ</sup> PA OQ ΔP(f) = σ → τ Δ - A : σ Δ<sup>O</sup> <sup>|</sup> <sup>Δ</sup>P <sup>f</sup> (A,c) −−−−−→Ty ΔO, <sup>Δ</sup>, <sup>c</sup> : <sup>¬</sup><sup>τ</sup> | ⊥; <sup>Δ</sup>P ΔP(c) = ¬σ Δ - A : σ Δ<sup>O</sup> <sup>|</sup> <sup>Δ</sup>P <sup>c</sup>(A) −−−→Ty ΔO, <sup>Δ</sup> | ⊥; <sup>Δ</sup>P OA

**Fig. 3.** Definition of <sup>L</sup>Ty, the typing LTS: transitions of type-context configurations

The Interactive LTS L<sup>I</sup> is then defined as the triple (ConfsTy, Actions, −→Ty) with relation −→Ty defined in Figure 3. Notice that the type of the active term is ⊥ since the reduction relation -→op is well-defined only on terms of this type.

Typing configurations can be used to specify interactive configurations, via the following validity judgement.

**Definition 5.** *An interactive configuration* <sup>I</sup> *is said to be validated by a typing configuration* S*, written* I S*, when:*


## **3.5 Disclosing LTS**

In order to enforce a *non-omniscient* condition on Opponent transitions, we introduce a Disclosing LTS LDi (DConfs, Actions, −→Di) whose configurations DConfs are pairs of sets of locations L; D with D a set of atoms contained in L. The transition relation −→Di is defined in Figure 4. The condition L∩supp(**o**) ⊆ D corresponds to the fact that Opponent cannot play Proponent atoms that have not been disclosed yet, i.e. not in D.

32 D. Hirschkoff et al.

$$\stackrel{\mathsf{op}}{\mathsf{op}} \xrightarrow[\mathsf{T}]{\mathsf{op}} \stackrel{\mathsf{op}}{\longrightarrow}\_{\mathsf{Di}} \langle \mathsf{L} \cup \mathsf{L}'; \mathsf{D} \rangle$$

PQ/PA -<sup>L</sup>; <sup>D</sup> **<sup>p</sup>** −→Di -L; D ∪ supp(**p**) L ∩ supp(**o**) ⊆ D -<sup>L</sup>; <sup>D</sup> **<sup>o</sup>** −→Di -L ∪ supp(**o**); D ∪ supp(**o**) OQ/OA

**Fig. 4.** Definition of LDi, the Disclosing LTS

**Definition 6.** *An interactive configuration* <sup>I</sup> *is said to be validated by a disclosing configuration* <sup>D</sup> <sup>=</sup> -<sup>L</sup>; <sup>D</sup>*, written* <sup>I</sup> - <sup>D</sup>*, if when writing* <sup>S</sup> *for the store component of* <sup>I</sup>*, we have* dom(S) <sup>=</sup> <sup>L</sup>*.*

#### **3.6 Operational Game Semantics: LTS and Trace Equivalence**

The *Operational Game Semantics* (OGS) LTS <sup>L</sup>OGS - (Confsogs, Actions, **a** −→ogs) is defined over configurations G, H ∈ Confsogs of the shape (I, S, D), with I - S and I - D, or over initial configurations -Σ; Γ M : σ for Proponent and c : ¬Unit (S; δ) : (Σ; Γ) for Opponent. Its transition relation is defined by the following rules:

I **a** −→<sup>I</sup> J S **<sup>a</sup>** −→Ty T D **<sup>a</sup>** −→Di E J - T J - E (I, S, D) **<sup>a</sup>** −→ogs (J, T, E) <sup>Γ</sup> <sup>=</sup> −−−−−−→ (*x*<sup>i</sup> : <sup>σ</sup>i) −−−−−−−−−−→ <sup>Δ</sup><sup>i</sup> <sup>A</sup><sup>i</sup> : <sup>σ</sup><sup>i</sup> <sup>L</sup> <sup>=</sup> (∪isupp(Ai)) ∪ dom(Σ) -Σ; Γ M : ⊥ ?( −→Ai) −−−→ogs - -M −−−−−−−−→ {*x*<sup>i</sup> :<sup>=</sup> <sup>A</sup><sup>i</sup> };L; <sup>ε</sup>, - −→Δ<sup>i</sup> |⊥;, -L; L <sup>Γ</sup> <sup>=</sup> −−−−−−→ (*x*<sup>i</sup> : <sup>σ</sup>i) −−−−−−−−−−−−−→ <sup>δ</sup>(*x*i) -(Ai; <sup>γ</sup>i) −−−−−−−−−→ <sup>Δ</sup><sup>i</sup> <sup>A</sup> : <sup>σ</sup><sup>i</sup> <sup>L</sup> <sup>=</sup> <sup>Σ</sup>−<sup>1</sup>(**ref** Unit) c : ¬Unit (S; <sup>δ</sup>) : (Σ; <sup>Γ</sup>) ?( −→Ai) −−−→ogs - -<sup>S</sup>; −→γi, c : ¬Unit| −→Δi, -L; L 

The initial question generated by -Σ; Γ M : σ provides a way for Opponent to instantiate variables of Γ with abstract values. In this setting Σ only contains atoms since M is a term of λμν. The transition for c : ¬Unit (S; δ) : (Σ; Γ) represents this behavior from the point of view of Opponent. Since contexts belong to λμref, these initial configurations come equipped with an initial store S of type Σ, but only the locations of type **ref** Unit are considered to be disclosed, since the other ones cannot be used by Proponent. The continuation name c is used for Opponent to provide its final answer, which is of type Unit, following the notion of observation used to define contextual equivalence.

We use notation **<sup>p</sup>** ⇒= ogs to denote a **p** transition preceded by a possibly empty sequence of op transitions. Trace equivalence according to LOGS and contextual equivalence coincide.

$$\begin{array}{l} \text{Decding Conxentual Equivalence of } \mathsf{v}\text{-Calculus with Effectful Conxats} \\\\ \text{op} \xrightarrow{\begin{subarray}{l} \mathsf{(\mathsf{N};\widehat{\mathsf{L}}) \rightarrow \mathsf{op} \ \langle\mathsf{N};\widehat{\mathsf{L}}^{\mathsf{T}}\rangle \end{subarray}} \mathsf{P}\mathsf{Q} \xrightarrow{\begin{subarray}{l} \mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y}) \end{subarray}} \frac{\mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y})}{\langle\mathsf{E}[\mathsf{f}\mathsf{V}];\mathsf{L}\rangle \xrightarrow{\overline{\mathsf{f}}(\mathsf{A};\mathsf{c})} \mathsf{P}\mathsf{N}} \xrightarrow{\begin{subarray}{l} \mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y}) \end{subarray}} \frac{\mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y})}{\langle\mathsf{C}[\mathsf{f};\mathsf{L}\rangle \xrightarrow{\overline{\mathsf{E}}(\mathsf{A})} \mathsf{P}\mathsf{N}} \mathsf{P}\mathsf{N} \end{array} \\\\ \begin{array}{l} \mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y}) \\\ \langle\mathsf{C}[\mathsf{v}]\mathsf{V};\mathsf{L}\rangle \xrightarrow{\mathsf{E}(\mathsf{A})} \mathsf{P}\mathsf{N} \end{array} \\ \begin{array}{l} \mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y}) \\\ \langle\mathsf{C}[\mathsf{v}]\mathsf{V};\mathsf{L}\rangle \xrightarrow{\mathsf{E}(\mathsf{A})} \mathsf{P}\mathsf{N} \end{array} \\\\ \begin{array}{l} \mathsf{V}\ \mathsf{J}^{\mathsf{P}}(\mathsf{A};\mathsf{y}) \\\ \langle\mathsf{C}[\mathsf{v}]\mathsf{V},\mathsf{L}\rangle \$$

**Fig. 5.** Definition of LPI: transitions of prime interactive configurations

## **Theorem 7.** *Consider two terms* M, N *such that* Σ; Γ M, N : σ*. We have* Σ; Γ M : σ tr Σ; Γ N : σ *if and only if* Σ; Γ M ctx N : σ*.*

Such a full-abstraction theorem was proven in [13] for *RefML*, that is the intuitionistic fragment of λμref-calculus, without control operators. It was also proven in [10] for *HOSC*, a variant of the λμref-calculus, with the call/cc operator, but without atom disclosure. Such a full-abstraction result being rather standard, we have chosen to present its proof in [9].

In the remainder of the paper, we focus on the λμν-calculus. In particular, we only consider OGS configurations corresponding to λμν from now on.

## **4 Lassen Trees for the** λμν**-calculus**


#### **4.1** POGS **and** POGS **bipartite bisimulation**

We introduce Lassen trees for terms of the λμν-calculus, as a form of *linearized* version of LOGS, where Opponent can interrogate a name provided by Proponent only once, immediately after it has been introduced. So we consider *prime interactive configurations* which are either passive of the shape L; γ, or active of the shape M; L with M a term, L a set of atoms, and γ a substitution. Compared to interactive configurations, the active configurations do not carry an environment γ. Furthermore, we have a set of atoms rather than a full store, since this LTS is defined only for the λμν-calculus and not for the whole λμref-calculus.

The Prime Interactive LTS, LPI, is then defined as (ConfsPI, Actions, −→PI), with −→PI defined in Figure 5.

The corresponding Typing LTS is defined using the transitions given in Figure 6, which are very close in spirit to the transitions in Figure 3.

The transitions for the Disclosing LTS for POGS are presented on Figure 7. We compare these with the Disclosing LTS for OGS (Figure 4) below.

The Prime Operational Game Semantics LTS is introduced as a synchronization product, together with initial transitions, like for OGS. More precisely, the synchronization between the interactive and typing LTSs requires that active configurations M; L correspond to type-contexts of the shape Δ<sup>O</sup> | ⊥, with <sup>Σ</sup>; <sup>Δ</sup><sup>O</sup> <sup>M</sup> : <sup>⊥</sup> and -L : Σ, for some store typing context Σ. Accordingly, for passive configurations L; γ, we synchronize with Δ<sup>O</sup> | ΔP, and check that <sup>Σ</sup>; <sup>Δ</sup><sup>O</sup> <sup>γ</sup> : <sup>Δ</sup><sup>P</sup> and -L : Σ, for some store typing context Σ.

$$\begin{array}{c} \mathsf{PQ} \xrightarrow{\scriptstyle \mathsf{A}\_{\mathsf{O}}(\mathsf{f}) = \sigma \rightarrow \tau \qquad \Delta \mathsf{I} \vdash \mathsf{A} : \sigma \\\\ \langle \mathsf{A}\_{\mathsf{O}} \mid \bot \rangle \xrightarrow{\widetilde{\mathsf{f}}(\mathsf{A}, \mathsf{c})} \mathsf{P} \mathrm{T}\_{\mathsf{Y}} \ \langle \mathsf{A}\_{\mathsf{O}} \mid \Delta, \mathsf{c} : \neg \tau \rangle \end{array} \qquad \begin{array}{c} \mathsf{A}\_{\mathsf{O}}(\mathsf{c}) = \neg \sigma \qquad \Delta \models \mathsf{A} : \sigma \\\\ \langle \mathsf{A}\_{\mathsf{O}} \mid \bot \rangle \xrightarrow{\widetilde{\mathsf{E}}(\mathsf{A})} \mathsf{P} \mathrm{T}\_{\mathsf{Y}} \ \langle \mathsf{A}\_{\mathsf{O}} \mid \Delta \rangle \end{array} \mathsf{P} \mathsf{A} $$

**Fig. 6.** Definition of LPTy: transitions of prime type-context configurations

$$\mathsf{op}\ \frac{\mathsf{D}' \subseteq \mathsf{L}'}{\langle \mathsf{L}, \mathsf{D} \rangle \xrightarrow{\mathsf{op}} \mathsf{p} \mathsf{d} \quad \langle \mathsf{L} \ \mathsf{u} \ \mathsf{L}'; \mathsf{D} \ \mathsf{u} \ \mathsf{D}' \rangle}$$

$$\mathsf{PQ}/\mathsf{P}\mathsf{A} \ \frac{\mathsf{supp}(\mathsf{p}) \subseteq \mathsf{D}}{\langle \mathsf{L}, \mathsf{D} \rangle \xrightarrow{\mathsf{p}} \mathsf{p}\_{\mathsf{p}d} \ \langle \mathsf{L}, \mathsf{D} \rangle} \qquad \qquad \frac{\mathsf{L} \cap \mathsf{supp}(\mathsf{o}) \subseteq \mathsf{D}}{\langle \mathsf{L}, \mathsf{D} \rangle \xrightarrow{\mathsf{o}} \mathsf{p}\_{\mathsf{p}d} \ \langle \mathsf{L} \cup \mathsf{supp}(\mathsf{o}); \mathsf{D} \cup \mathsf{supp}(\mathsf{o}) \rangle} \quad \mathsf{OQ}/\mathsf{O}\mathsf{A}$$

To synchronize with the Disclosing LTS, whose states are of the form L; D, we simply impose that the L component is the same in the state of LPI, both for active and passive configurations.

We call LPOGS the LTS obtained by synchronizing LPI, LPTy and LPDi. We write P, Q ∈ ConfsPOGS the configurations of LPOGS. The *Lassen tree* of a term is then defined as the unfolding of the LPOGS on the initial active configuration associated with this term.

*Example 8.* The Lassen trees (omitting the typing configurations) for [c]new *n* in λ .*n* and [c]λ .new *n* in *n* are given by:

[c]new *n* in λ .*n*; ∅, ∅; ∅ {a}; [f → λ .a], {a}, ∅ {a}; [f → λ .a], {a}, {a} [c ](λ .a)(), {a}, {a}, ∅ [c ](λ .a)(), {a}, {a}, {a} {a}; ε, {a}, {a} c(f) c(f) f((), c ) f((), c ) c (a) [c]λ .new *n* in *n*; ∅, ∅; ∅ ∅; [f → λ .new *n* in *n*], ∅, ∅ [c ](λ .new *n* in *n*)(), ∅, ∅, ∅ {a}; ε, {a}, {a} c(f) f((), c ) c (a)

Due to the condition supp(**p**) ⊆ <sup>D</sup> in **<sup>p</sup>** −→pd, some configurations with terms in normal form do not have a corresponding Proponent transition. The dashed arrows correspond to op transitions that lead to such stuck configurations.

#### **4.2 Bipartite Bisimulations for** OGS **and** POGS

We consider typed relations on passive and active configurations, that is, we require related configurations to have the same type. This means in particular that the environment components γ of the two configurations have the same domain. In addition to the typing, we also enforce that both sets of disclosed atoms are identical.

**Definition 9.** *A* bipartite bisimulation *is a pair of relations* (RAct, RPas) *respectively on active and passive configurations, such that:*


*An* OGS*-*bipartite bisimulation *is a bipartite bisimulation defined over* LOGS*, and a* POGS*-*bipartite bisimulation *is a bipartite bisimulation defined over* LPOGS*. We write* ogs *and* pogs *respectively for the greatest bipartite bisimulation respectively over* LOGS *and* LPOGS*.*

The following property follows from the fact that the transition relation is deterministic (up to the choice of fresh names).

#### **Lemma 10.** ogs *coincides with trace equivalence on* OGS *configurations.*

For op transitions, the difference between OGS and POGS shows up in the disclosing LTS: in op −−→pd, a D component can be chosen non-deterministically. This observation is related to the existential quantification in the second clause of Definition 13. Both in LOGS and LPOGS, there is only one possible next visible (Proponent) move. However, in pogs, the game involves choosing an appropriate set of atoms to be disclosed along op −−→pd transitions. For instance, when constructing a POGS bipartite bisimulation between terms new *n* in λ .*n* and λ .new *n* in *n* from Example 8, we have two choices for the second step:

$$\begin{array}{llll} \left( (\langle \{ \mathbf{a} \}; [\mathbf{f} \mapsto \lambda \Box \mathbf{a}] \rangle, (\{ \mathbf{a} \}, \varnothing) \rangle \right), & (\langle \varnothing \rangle; [\mathbf{f} \mapsto \lambda \Box \mathbf{a} \text{ new } n \text{ in } n] \rangle, (\varnothing, \varnothing)) \\\ ( (\langle \{ \mathbf{a} \}; [\mathbf{f} \mapsto \lambda \Box \mathbf{a}] \rangle, (\{ \mathbf{a} \}, \{ \mathbf{a} \}) ), & (\langle \varnothing \rangle; [\mathbf{f} \mapsto \lambda \Box \mathbf{a} \text{ in } n] \rangle, (\varnothing, \varnothing))) \end{array}$$

The latter does not satisfy the constraint on the disclosed set, since the sets are not the same in the two configurations. The former leads to a stuck configuration: ([c ](λ .a)(), {a} , {a}, ∅ ) cannot perform any Proponent move. Thus the two programs are not equivalent.

#### **4.3 Deciding** pogs

We now study how to decide when two POGS configurations are bisimilar. First, trees generated by LPOGS are of finite depth.

## **Lemma 11.** Taking a POGS configuration <sup>G</sup>, any trace in **Tr**POGS(G) is finite.

This lemma is proven using a biorthogonal logical predicate, following the use of biorthogonality to prove strong normalization of λμ-calculus [23], the computational metalanguage [18], and cut elimination for linear logic [8]. The proof can be found in [9].

Due to the non-determinism of atom generation in -→op, of function name generation in -, and of name picking in Opponent transitions, the trees generated by LPOGS are infinitely branching. To tame this infinite branching, we see the set of moves Moves and the set of configurations ConfsPOGS of LPOGS as nominal sets [7] over atoms, function and continuation variables. So taking π a finite permutation over these sets, we write π ∗ *X* for the action of permutation π over elements of nominal set *X*. The transition relation −→pogs of LPOGS preserves this action of permutation, i.e., it is equivariant: if P **<sup>m</sup>** −→pogs Q then for all finite permutation <sup>π</sup>, we have <sup>π</sup> <sup>∗</sup> <sup>P</sup> <sup>π</sup>∗**<sup>m</sup>** −−−→pogs <sup>π</sup> <sup>∗</sup> <sup>Q</sup>.

One can then consider a variant LDPOGS of the POGS LTS which uses the same set of configurations as LPOGS, but whose transition relation −→dpogs chooses fresh atoms and names deterministically. So −→dpogs is then deterministic on op and Proponent actions, and finitely branching on Opponent actions.

We remark at this point that the notion of bipartite bisimulation pogs introduced in Definition 13 is not suited for LDPOGS. Indeed, it requires equality of actions in the bisimulation game, and also that configurations related by bisimulation have the same type. So we relax the definition of pogs and work with ternary relations, adding a finite permutation of names and atoms in order to match the actions, rather than enforcing syntactic equality.

**Definition 12.** A relation R ⊆ ConfsPOGS <sup>×</sup> ConfsPOGS <sup>×</sup> Perm is said to be valid when, for all ((I, S, , D), (J, T, , D ), π)∈R, we have T = π ∗ S and D = π ∗ D.

**Definition 13.** <sup>A</sup> relaxed bipartite bisimulation is a pair of valid relations (RAct, RPas) respectively on active and passive configurations such that:


We write <sup>r</sup> pogs for the greatest relaxed bipartite bisimulation over LPOGS. From the fact that −→pogs is equivariant, we deduce that <sup>r</sup> pogs and ogs coincide. Since LDPOGS generates finite Lassen trees, we deduce that the bisimulation game can be decided.

**Theorem 14.** Taking two POGS configurations <sup>P</sup>, <sup>Q</sup>, we can decide if <sup>P</sup> pogs <sup>Q</sup>.

#### **4.4 Relating the Transitions in** OGS **and** POGS

To relate the transitions in the OGS and in the POGS, we need to introduce some relations and operations on OGS configurations.

**Definition 15.** *Let* G = (I, S, -L; D) *and* H = (I, S, -L; D ) *be two* OGS *configurations. We write* G ⊆Di H *when* D ⊆ D *.*

When G ⊆Di H, the configurations only differ by their set of disclosed atoms.

**Lemma 16.** *If* <sup>G</sup> <sup>⊆</sup>Di <sup>H</sup> *and* <sup>G</sup> **<sup>a</sup>** −→ogs <sup>G</sup> *then* <sup>H</sup> **<sup>a</sup>** −→ogs H *and* G ⊆Di H *.*

**Lemma 17.** *Let* P *be an active prime configuration. We have the following:*

$$\begin{array}{l} \mathsf{I} \quad \mathit{if} \, \mathbb{P} \xrightarrow{\mathsf{op}} \mathsf{g} \, \mathbb{P}', \,\, then \,\mathbb{P} \xrightarrow{\mathsf{op}} \mathsf{p} \,\, \mathbb{P}',\\ \mathsf{I} \quad \mathit{if} \, \mathbb{P} \xrightarrow{\mathsf{op}} \mathsf{p} \, \mathbb{P}', \,\, then \,\mathbb{P} \xrightarrow{\mathsf{op}} \mathsf{g} \,\underline{\subseteq} \mathsf{Di} \,\, \mathbb{P}'. \end{array}$$

In POGS, the disclosed set increases in op transitions as seen above, but not in **p** transitions. In a sense, disclosing in OGS is done only when needed, whereas in POGS, disclosing must be declared as soon as the atom is created. This is ensured by the additional condition supp(**p**) ⊆ <sup>D</sup> in the rule for **<sup>p</sup>** −→pd.

**Lemma 18.** *When* P **<sup>p</sup>** −→pogs <sup>P</sup> *with* <sup>P</sup> *active, we also have* <sup>P</sup> **<sup>p</sup>** −→ogs P *.*

However, the converse does not always hold, specifically if an atom has been declared non-disclosed but still appears in the action **p**. Indeed, the transition (-[c]a;-L; ∅, S, -<sup>L</sup>; ∅) <sup>c</sup>(a) −−−→ogs (--L; ∅, S, -L; {a}) is valid for OGS, but has no counterpart in POGS, since -<sup>L</sup>; ∅ cannot make the transition <sup>c</sup>(a) −−−→pd.

Using the following notion of limit (on OGS configurations), we can intuitively replace D by its minimal extension, preventing this phenomenon from happening.

**Definition 19.** *Given a configuration* G = (I, S, -L; D)*, we define its* limit *as:*

$$\operatorname{lim}(\mathbb{G}) \triangleq (\mathbb{I}, \mathbb{G}, \langle \mathsf{L}; \bigcup\_{\mathsf{t} \in \mathsf{Trac.os}} (\mathsf{L} \cap \mathsf{D}') \rangle) \text{ with } \mathsf{G} \xrightarrow{\mathsf{t}} \mathsf{ogss} (\mathsf{\tau} \dashv \langle \mathsf{\tau}, \mathsf{D}' \rangle).$$

We have that G ⊆Di **lim**(G) and **lim** is idempotent. We call *limit configurations* those configurations that are a limit (or alternatively, that are their own limit). Being a limit configuration is preserved by moves but not necessarily by op.

**Lemma 20.** *Let* P *be a limit configuration. If* P **<sup>p</sup>** −→ogs P *, then* P **<sup>p</sup>** −→pogs P *.*

For Opponent transitions, the situation is less simple since not all active OGS configurations are active POGS configurations. To circumvent that issue, we reuse the tensor product from [12]. For two OGS configurations where at least one is passive, we define the tensor product, written ⊗, as follows:

$$(\mathbb{T}, \mathbb{S}, \mathbb{D}) \otimes (\mathbb{J}, \mathbb{T}, \mathbb{E}) = (\mathbb{L} \otimes \mathbb{J}, \mathbb{S} \otimes \mathbb{T}, \mathbb{D} \otimes \mathbb{E})$$

$$\langle \mathbb{S}; \gamma \rangle \otimes \langle \mathbb{S}'; \gamma' \rangle = \langle \mathbb{S} \cup \mathbb{S}'; \gamma \cdot \gamma' \rangle \qquad \langle \mathbb{M}; \mathbb{S}; \gamma \rangle \otimes \langle \mathbb{S}'; \gamma' \rangle = \langle \mathbb{M}; \mathbb{S} \cup \mathbb{S}'; \gamma \cdot \gamma' \rangle$$

$$\langle \mathbb{L}; \mathbb{D} \rangle \otimes \langle \mathbb{L}'; \mathbb{D}' \rangle = \langle \mathbb{L} \cup \mathbb{L}'; \mathbb{D} \cup \mathbb{D}' \rangle \text{ when } \begin{array}{c} \mathbb{D}' \cap \mathbb{L} \subseteq \mathbb{D} \\ \mathbb{D} \cap \mathbb{L}' \subseteq \mathbb{D}' \end{array}$$

The side conditions for the L and D components ensure that no shared atom is disclosed on one configuration but not the other.

We can then describe an active OGS configuration as the tensor of two POGS configurations (where <sup>S</sup> <sup>=</sup> -L):

$$(\langle \mathbb{M}; \mathbb{S}; \gamma \rangle, \langle \Lambda\_O \vdash \bot; \Delta\_P \rangle, \langle \mathsf{L}, \mathsf{D} \rangle) = (\langle \mathbb{M}; \mathsf{L} \rangle, \langle \Delta\_O \vdash \bot \rangle, \langle \mathsf{L}, \mathsf{D} \rangle) \otimes (\langle \mathsf{L}; \gamma \rangle, \langle \Delta\_O \vdash \Delta\_P \rangle, \langle \mathsf{L}, \mathsf{D} \rangle)$$

Finally, we have the following property for opponent transitions:

**Lemma 21.** When <sup>P</sup> **<sup>o</sup>** −→pogs <sup>Q</sup>, we have <sup>P</sup> **<sup>o</sup>** −→ogs Q ⊗ P. When P **<sup>o</sup>** −→ogs <sup>G</sup>, we have <sup>P</sup> **<sup>o</sup>** −→pogs Q with G = Q ⊗ P.

# **5 Relating Bisimilarities in** OGS **and** POGS

In this section, we show that pogs can be used to characterize ogs for the limit configurations introduced above. We rely for that on up-to techniques for bipartite bisimulation in OGS, which we introduce first.

#### **5.1 Up-to techniques for** ogs

The proofs in this section use the theory of compatible functions [27,25]. More details can be found in [9].

**Definition 22 (Bipartite bisimulation up-to).** Given a function *f* , a bipartite bisimulation up to *f* is a pair (RAct, RPas) such that:


We then define hide(RAct, RPas) - (⊆DiRAct⊇Di, ⊆DiRPas⊇Di). Recall that we still require that hide(RAct, RPas) only contains pairs of configurations with the same disclosed set. The soundness of hide can be proved using Lemma 16.

**Lemma 23.** hide is a sound up-to technique, i.e. if (RAct, RPas) is a bisimulation up to hide, then (RAct, RPas) ⊆ogs.

Given a pair of relations (RAct, RPas) on active and passive OGS configurations respectively, we define the following functions:

$$\begin{array}{ll} \mathsf{t}\texttt{t}\texttt{t}\texttt{s}\texttt{r}(\mathscr{R}\_{Act},\mathscr{R}\_{Pas}) \triangleq \left( \{ (\mathbb{G}\_{1}\otimes\mathbb{G}\_{2},\mathbb{H}\_{1}\otimes\mathbb{H}\_{2}) \, \mathrm{s}.\mathrm{t}.\,(\mathbb{G}\_{1},\mathbb{H}\_{1}) \in \mathscr{R}\_{Act}, (\mathbb{G}\_{2},\mathbb{H}\_{2}) \in \mathscr{R}\_{Pas}\},\\ \qquad \{ (\mathbb{G}\_{1}\otimes\mathbb{G}\_{2},\mathbb{H}\_{1}\otimes\mathbb{H}\_{2}) \, \mathrm{s}.\,(\mathbb{G}\_{1},\mathbb{H}\_{1}), (\mathbb{G}\_{2},\mathbb{H}\_{2}) \in \mathscr{R}\_{Pas}\}\\ \texttt{s}\texttt{p}\texttt{1t}(\mathscr{R}\_{Act},\mathscr{R}\_{Pas}) \triangleq \left( \{ (\mathbb{G}\_{1},\mathbb{H}\_{1}) \qquad \mathrm{s}.\mathrm{t}.\,(\mathbb{G}\_{1}\otimes\mathbb{G}\_{2},\mathbb{H}\_{1}\otimes\mathbb{H}\_{2}) \in \mathscr{R}\_{Act} \},\\ \qquad \{ (\mathbb{G}\_{1},\mathbb{H}\_{1}) \qquad \mathrm{s}.\mathrm{t}.\,(\mathbb{G}\_{1}\otimes\mathbb{G}\_{2},\mathbb{H}\_{1}\otimes\mathbb{H}\_{2}) \in \mathscr{R}\_{Pas}\} \end{array} \right)$$

**Lemma 24.** split(ogs) ⊆ogs.

tensor is not a sound up-to technique. It is nevertheless useful to reason about POGS bipartite bisimilar configurations; see Theorem 30 below.

#### **5.2 Properties of the Limit (in** OGS**)**

**Lemma 25 (Monotonicity).** If <sup>G</sup> is passive and <sup>G</sup> **<sup>t</sup>** −→ogs H, then there exists G such that G ⊗ G ⊆Di H.

Lemma 25 shows that transitions can only increase the substitution and the store (corresponding to the G component), and the set of disclosed atoms (represented by the use of ⊆Di). More precisely, ⊆Di is required if some atoms from G are disclosed along the trace **t**, in which case new ones can appear in G .

Lemma 25 is language specific. It does not hold when the language allows the content of the store to be modified (like, e.g. in λμref). Additionally, LTSs enforcing some local restriction on the usage of function or continuation names usually have extra components that are modified along the transitions; we return to this point in Section 7.

In a limit configuration (Definition 19), all atoms that may be disclosed at some point are disclosed. By Lemma 25, these atoms can be disclosed using a single trace.

**Lemma 26.** Given a passive configuration G, there exists a trace **t** and a configuration H such that G **<sup>t</sup>** −→ogs **lim**(G) ⊗ H.

The limit is also useful to relate transitions in OGS and in POGS as follows.

**Lemma 27.** Take a POGS configuration P.

If P is active and P **<sup>a</sup>** −→ogs <sup>Q</sup>, then **lim**(P) **<sup>a</sup>** −→pogs **lim**(Q). If P is passive and P **<sup>o</sup>** −→ogs <sup>Q</sup> <sup>⊗</sup> <sup>P</sup>, then **lim**(P) **<sup>o</sup>** −→pogs **lim**(Q).

All in all, we obtain that ogs is a congruence for **lim**. For R a relation over configurations, we write **lim**(R) for the set {(**lim**(G), **lim**(H)) | (G, H) ∈ R}.

**Lemma 28.** ogs is closed by computing the limit: **lim**(ogs)⊆ ogs.

The case for passive configurations follows immediately from Lemmas 26 and 24.

40 D. Hirschkoff et al.

The property of the limit might make us think that the disclosure process of an atom could be decided statically, by annotating new syntactically. The following example shows that it is not the case:

### λ*b*.new *n*, *m* in λ .if *b* then *n* else *m*

Either *n* or *m* will be disclosed depending on the boolean *b* given by Opponent, but never both. So this term is indeed contextually equivalent to λ*b*.new *n* in λ .*n*.

#### **5.3 Correspondence Between** ogs **and** pogs

**Theorem 29 (From** ogs **to** pogs**).** *Consider two* POGS *configurations* P *and* <sup>Q</sup>*. If* <sup>P</sup> ogs <sup>Q</sup> *are both limit configurations, then* <sup>P</sup> pogs Q*.*

To reason about bisimilar POGS configurations, we use the closure of tensor, written tensor -. Intuitively, tensor -(RAct) contains the pairs (G<sup>1</sup> <sup>⊗</sup> <sup>G</sup>2, <sup>H</sup><sup>1</sup> <sup>⊗</sup> <sup>H</sup>2) with (G1, <sup>H</sup>1)∈RAct, (G2, <sup>H</sup>2) ∈ tensor -(RPas), and tensor -(RPas) contains the pairs (G<sup>1</sup> <sup>⊗</sup> <sup>G</sup>2, <sup>H</sup><sup>1</sup> <sup>⊗</sup> <sup>H</sup>2) with (G1, <sup>H</sup>1)∈RPas, (G2, <sup>H</sup>2) ∈ tensor -(RPas). *tion. Then* tensor -

**Theorem 30 (From** pogs **to** ogs**).** *Suppose* <sup>R</sup> *is a* POGS *bipartite bisimula-*(R)*is a* OGS *bipartite bisimulation up-to hiding.*

By Lemma 23, Theorem <sup>30</sup> means that if <sup>P</sup> pogs <sup>Q</sup>, then <sup>P</sup> ogs Q.

The correspondence between ogs and pogs is restricted to prime configurations as pogs can only relate those. Having the additional conditions of configurations being limits is enough for our decidability result.

## **6 Related Work**

The ν-calculus was introduced in [24], together with logical relations to reason over contextual equivalence for this language. These logical relations use a Kripke-style definition, worlds being defined as spans of atoms to keep track of the disclosed atoms, similar to the permutation we use in our relaxed bipartite bisimulations. They capture contextual equivalence for programs of *first order* type, but are an incomplete technique for higher-order programs. This entails a decidability result for the first-order fragment of the ν-calculus, since logical relations only quantify over finite objects at first-order types.

Categorical models of the ν-calculus were provided in [29,30], using a representation of name creation via a strong monad. Two examples of such models were given: (i) the functor category *Set*<sup>I</sup> with *I* the category of finite sets and injection; (ii) the category **B***G* of continuous *G*-sets, with *G* the topological group of automorphisms over N. None of these models are fully-abstract, since they distinguish new *n* in λ*x*.*x* = *n* from λ*x*.false.

These models were later refined using nominal sets [7], so that types are interpreted via *Fraenkel-Mostowski* sets [28] or domains [14]. Both of these works are continuation models; they might be used to provide a semantics for the λμνcalculus studied in this paper, a direction we wish to explore in future work. Such use of continuations was justified in [28] to provide a model for an extension of the ν-calculus with recursion. More recently, *proof-relevant* logical relations were introduced to deal with recursion in the presence of name generation [4].

In [26], a model of the ν-calculus is given in quasi-Borel spaces, showing a correspondence between random sampling and fresh name generation. This model is shown to be fully-abstract for terms of first-order types.

In [5], environmental bisimulations for the ν-calculus are defined and shown to be fully abstract. Nevertheless, it does not seem possible to extract a decision procedure from that result, since environmental bisimulations are played over a higher-order LTS, that is, an LTS whose actions contain λ-terms. So this LTS is infinitely branching at higher-order types.

Eager normal-form bisimulations have been introduced by Lassen for the callby-value λ-calculus [16] and λμ-calculus. In [31], a notion of bisimulation similar to ogs is introduced and shown to be fully abstract for an untyped version of λμref. Compared to the standard notion of eager normal form bisimulations, the configurations in the bisimulations in [31] contain an environment similar to the environment component γ of the OGS LTS in Section 3.

In [1], a fully-abstract game model is provided for the ν-calculus. However, this model requires an extensional collapse, that is not directly computable at higher-order type. So that model could only be used to prove the decidability of contextual equivalence for terms of first-order types. Enforcing a well-bracketed and visible behavior for Opponent in the OGS model, we believe that our trace model would coincide with the intentional game model of [1]. Nominal game semantics was developed for languages with nominal references and exceptions in [32]. In that setting, algorithmic presentations of game semantics make it possible to provide a classification of decidability of call-by-value languages with (bounded) integer references [19], and ground references [21]. In this setting, the undecidability of contextual equivalence originates from the use of integer references by Proponent. A detailed survey on the literature on contextual equivalence for the ν-calculus is available in [33].

## **7 Conclusion**

To decide the contextual equivalence between two λμν typed terms M and N with contexts in the λμref-calculus, we first construct the corresponding initial configurations, and we can decide by Thm. 14 if they are POGS-bisimilar. This decidability result comes from the fact that the POGS LTS generates finite trees.

Then, we prove in Thm. 29 and Thm. 30 that two initial active configurations are POGS-bisimilar iff they are OGS-bisimilar. This is possible because initial configurations are prime (they are active and γ is empty) and are also limit configurations (their disclosed sets contain all the atoms of the store). In Thm. 7 and Lemma 10, we prove that M and N are contextually equivalent iff the corresponding initial configurations are OGS-bisimilar, which yields decidability.

We now examine the obstacles that remain to prove the decidability of contextual equivalence with contexts in the ν-calculus.

First of all, in that setting, trace equivalence would not be fully-abstract anymore (Thm. 7). Indeed, without integer references, one cannot observe the sequentiality of calls and returns. So an extensional collapse would be necessary.

Another obstacle is that in the absence of higher-order references, Opponent must satisfy a condition of *O-visibility* [2], that corresponds to a local well-scoping discipline, for the function names it is allowed to call. Working in an intuitionistic type system, corresponding to the standard λ-calculus without control operators, the call-and-return discipline of the interaction between Proponent and Opponent has to be *well-bracketed*. These two conditions, namely O-visibility and well-bracketing, can be enforced operationally [13] in the LTS, by keeping track of part of the history of the interaction. However the reduction of ogs to pogs is not possible anymore in that setting. Indeed, the limit over-approximates the set of atoms that can be tested. This can be seen when comparing the programs

new *n* in let = y(λ*z*.*z* = *a*) in *n* and new *n* in let = y(λ*z*.false) in *n*

Assuming *n* is immediately disclosed makes it possible to distinguish the two programs. Because the local conditions of well-bracketing or visibility would prevent Opponent from playing some actions, Opponent could perform irreversible changes that would invalidate Lemma 25. This would make pogs incomplete.

To handle this difficulty, we could try and use Kripke eager normal-form bisimulation [11], using a structure for worlds richer than just a set of atoms.

Finally, in absence of *full ground references*, that can store locations, atoms played by Opponent would also follow a local well-scoping discipline, but the discriminatory power over Player atoms would also be restricted [20]. In such a setting, the same difficulties as with well-bracketing and O-visibility would arise, and a more complex extensional collapse would be needed.

## **References**


44 D. Hirschkoff et al.

*Conference, TLCA 2005, Nara, Japan, April 21-23, 2005, Proceedings*, volume 3461 of *Lecture Notes in Computer Science*, pages 262–277. Springer, 2005.


33. Nikos Tzevelekos. Program equivalence with names. In Amal Ahmed, Nick Benton, Lars Birkedal, and Martin Hofmann, editors, *Modelling, Controlling and Reasoning About State, 29.08. - 03.09.2010*, volume 10351 of *Dagstuhl Seminar Proceedings*. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, Germany, 2010.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Kantorovich Functors and Characteristic Logics for Behavioural Distances**

Sergey Goncharov<sup>1</sup> -, Dirk Hofmann<sup>2</sup> --, Pedro Nora1() ---, Lutz Schr¨oder<sup>1</sup> † and Paul Wild<sup>1</sup>

<sup>1</sup> Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany sergey.goncharov@fau.de, pedro.nora@fau.de, lutz.schroeder@fau.de, paul.wild@fau.de

<sup>2</sup> Center for Research and Development in Mathematics and Applications, University of Aveiro, Aveiro, Portugal

## dirk@ua.pt

**Abstract.** Behavioural distances measure the deviation between states in quantitative systems, such as probabilistic or weighted systems. There is growing interest in generic approaches to behavioural distances. In particular, coalgebraic methods capture variations in the system type (nondeterministic, probabilistic, game-based etc.), and the notion of *quantale* abstracts over the actual values distances take, thus covering, e.g., twovalued equivalences, (pseudo)metrics, and probabilistic (pseudo)metrics. Coalgebraic behavioural distances have been based either on *liftings* of Set-functors to categories of metric spaces, or on *lax extensions* of Set-functors to categories of quantitative relations. Every lax extension induces a functor lifting but not every lifting comes from a lax extension. It was shown recently that every lax extension is Kantorovich, i.e. induced by a suitable choice of monotone predicate liftings, implying via a quantitative coalgebraic Hennessy-Milner theorem that behavioural distances induced by lax extensions can be characterized by quantitative modal logics. Here, we essentially show the same in the more general setting of behavioural distances induced by functor liftings. In particular, we show that every functor lifting, and indeed every functor on (quantale-valued) metric spaces, that preserves isometries is Kantorovich, so that the induced behavioural distance (on systems of suitably restricted branching degree) can be characterized by a quantitative modal logic.

## **1 Introduction**

Qualitative transition systems, such as standard labelled transition systems, are typically compared under two-valued notions of behavioural equivalence,

<sup>-</sup> Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 501369690.

<sup>-</sup>- Funded by The Center for Research and Development in Mathematics and Applications (CIDMA) through the Portuguese Foundation for Science and Technology (FCT) – project numbers UIDB/04106/2020 and UIDP/04106/2020.

<sup>-</sup>-- Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 259234802.

<sup>†</sup> Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 434050016.

<sup>©</sup> The Author(s) 2023

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. https://doi.org/10.1007/978-3-031-30829-1 3 46–67, 2023.

such as Park-Milner bisimilarity. For quantitative systems, such as probabilistic, weighted, or metric transition systems, notions of *behavioural distance* allow for a more fine-grained comparison, in particular give a numerical measure of the deviation between inequivalent states, instead of just flagging them as inequivalent [14,6,2,24].

The variation found in the mentioned system types calls for unifying methods, and correspondingly has given rise to generic notions of behavioural distance based on *universal coalgebra* [33], a framework for state-based systems in which the transition type of systems is encapsulated as an (endo-)functor on a suitable base category. Coalgebraic behavioural distances have been defined on the one hand using *liftings* of given set functors to the category of metric spaces [5], and on the other hand using *lax extensions*, i.e. extensions of set functors to categories of quantitative relations [13,38]. Since every lax extension induces a functor lifting in a straightforward way [38] but on the other hand not every functor lifting is induced by a lax extension, the approach via liftings is more widely applicable. On the other hand, it has been shown that every lax extension is *Kantorovich*, i.e. induced by a suitable choice of modalities, modelled as predicate liftings in the spirit of coalgebraic logic [28,34]. Using quantitative coalgebraic Hennessy-Milner theorems, it follows that under expected conditions on the functor and the lax extension, behavioural distance coincides with logical distance.

Roughly speaking, our main contribution in the present paper is to show that the same holds for functor liftings and their induced behavioural distances. In more detail, we have the following (cf. Figure 1 for a graphical summary):


By a recent coalgebraic quantitative Hennessy-Milner theorem that fits this level of generality [12], it follows that given a functor F on (pseudo<)metric spaces that preserves isometries, acts non-expansively on morphisms, and admits a dense finitary subfunctor, behavioural distance can be characterized by quantitative modal logic (Corollary 5.10). In additional results, we further clarify the relationship between functor liftings and lax extensions, and in particular characterize the functor liftings that are induced by lax extensions (Theorem 3.18).

Indeed, we conduct the main technical development not only in coalgebraic generality, but also parametric in a quantale, hence abstracting both over distances and over truth values. One benefit of this generality is that our results cover the two-valued case, captured by the two-element quantale. In particular, one instance of our results is the fact that every finitary set functor has a separating set of finitary predicate liftings, and hence admits a modal logic having the Hennessy-Milner property [34]. Moreover, we do not restrict to symmetric distances, and hence cover also simulation preorders and simulation distances [24].

**Fig. 1.** Summary of connections (a rigorous categorical interpretation of these connections involves a square of adjunctions (3)).

*Related Work* Quantale-valued quantitative notions of bisimulation for functors that already live on generalized metric spaces (rather than being lifted from functors on sets) have been considered early on [40]. We have already mentioned previous work on coalgebraic behavioural metrics, for functors originally living on sets, via metric liftings [5] and via lax extensions [13,38]. Existing work that combines coalgebraic and quantalic generality and accommodates asymmetric distances, like the present work, has so far concentrated on establishing so-called van Benthem theorems, concerned with characterizing (coalgebraic) quantitative modal logics by bisimulation invariance [39]. There is a line of work on Kantorovich-type coinductive predicates at the level of generality of topological categories [21,22] (phrased in fibrational terminology), with results including a game characterization and expressive logics for coinductive predicates already assumed to be Kantorovich in a general sense, i.e. induced by variants of predicate liftings. In this work, the condition of preserving isometries already shows up as *fiberedness*, and indeed the condition already appears in work on metric liftings [5]. As mentioned in the above discussion, we complement existing work on quantitative coalgebraic Hennessy-Milner theorems [23,38,12] by establishing the Kantorovich property they assume.

## **2 Preliminaries**

We will need a fair amount of material on coalgebra, quantales and quantaleenriched categories (generalizing metric spaces), predicate liftings, and lax extension, which we recall in the sequel. New material starts in Section 3.

## **2.1 Categories and Coalgebras**

We assume basic familiarity with category theory [1,4]. More specifically, we make extensive use of topological categories [1] and quantale-enriched categories [26,20,36]. Recall that a *coalgebra* for a functor <sup>F</sup>: <sup>C</sup> <sup>→</sup> <sup>C</sup> consists of an object <sup>X</sup> of <sup>C</sup>, thought of as an object of states, and a morphism <sup>α</sup>: <sup>X</sup> <sup>→</sup> <sup>F</sup>X, thought of as assigning structured collections (sets, distributions, etc.) of successors to states. A *coalgebra morphism* from (X, α) to (Y,β) is a morphism <sup>f</sup> <sup>∈</sup> <sup>C</sup>(X, Y ) such that <sup>β</sup> · <sup>f</sup> <sup>=</sup> <sup>F</sup><sup>f</sup> · <sup>α</sup>. We will focus on *concrete categories* over Set, that is categories that come equipped with a faithful functor |−|: C → Set, which allows speaking about individual states as elements of <sup>|</sup>X|. A *lifting* of an endofunctor F: Set → Set to C is an endofunctor F: C → C such that |−| · F = F · |−|.

Example 2.1. Some functors of interest for coalgebraic modelling are as follows.

1. The finite powerset functor <sup>P</sup><sup>ω</sup> : Set <sup>→</sup> Set maps each set to its finite powerset, and for a map g, Pω(g) takes direct images under g. Given a set A (of labels), coalgebras for the the functor <sup>P</sup>ω(<sup>A</sup> × −) are finitely branching <sup>A</sup>-labelled transition systems.

2. The finite distribution functor <sup>D</sup><sup>ω</sup> : Set <sup>→</sup> Set maps a set <sup>X</sup> to the set DωX of finitely supported probability distributions on X. Given a finite set A, coalgebras for the functor (1 + Dω)<sup>A</sup>, are probabilistic transition systems [25,10].

Finitary functors are those which are determined by their action on finite sets. More precisely, a functor is finitary if for every set X and every x ∈ FX, there is a finite subset inclusion m: A → X such that x is in the image of Fm.

Standard examples of non-finitary functors are as follows.

3. The (unbounded) powerset functor P: Set → Set.

4. The neighbourhood functor N: Set → Set sends a set X to the set PPX, and a function f : X → Y to the function Nf : NX → NY that assigns to every element <sup>x</sup> <sup>∈</sup> <sup>N</sup><sup>X</sup> the set {<sup>B</sup> <sup>⊆</sup> <sup>Y</sup> <sup>|</sup> <sup>f</sup> <sup>−</sup><sup>1</sup><sup>B</sup> <sup>∈</sup> <sup>x</sup>}.

#### **2.2 Quantales and Quantale-Enriched Categories**

A central notion of our development is that of a quantale, which will serve as a parameter determining the range of truth values and distances. A *quantale* (V, <sup>⊗</sup>, k), more precisely a commutative and unital quantale, is a complete lattice V – with joins and meets denoted by and , respectively – that carries the structure of a commutative monoid with *tensor* <sup>⊗</sup> and *unit* <sup>k</sup>, such that for every u ∈ V, the map u ⊗ −: V→V preserves suprema. This entails that every u ⊗ − has a right adjoint hom(u, −): V→V, characterized by the property u ⊗ v ≤ w ⇐⇒ v ≤ hom(u, w). We denote by and ⊥ the greatest and the least element of a quantale, respectively. A quantale is *non-trivial* if <sup>⊥</sup><sup>=</sup> , and *integral* if <sup>=</sup> <sup>k</sup>.

Example 2.2. 1. Every frame (i.e. a complete lattice in which binary meets distribute over infinite joins) is a quantale with ⊗ = ∧ and k = . In particular, every finite distributive lattice is a quantale, prominently 2, the two-element lattice {⊥, } and 1, the trivial quantale.

2. Every left continuous t-norm [3] defines a quantale on the unit interval equipped with its natural order.


(Note that the quantalic order here is dual to the standard numeric order).

4. Every commutative monoid (M, ·, e) generates a quantale on PM (the free quantale over M) w.r.t. set inclusion and with the tensor A ⊗ B = {a · b | a ∈ A and b ∈ B}, for all A, B ⊆ M. The unit of this multiplication is the set {e}.

A V*-category* is pair (X, a) consisting of a set X and a map a: X × X → V such that k ≤ a(x, x) and a(x, y) ⊗ a(y, z) ≤ a(x, z) for all x, y, z ∈ X. We view a as a (not necessarily symmetric) *distance* function, noting however that objects with 'greater' distance should be seen as being closer together. A V-category (X, a) is *symmetric* if a(x, y) = a(y, x) for all x, y ∈ X. Every V-category (X, a) carries a *natural order* defined by x ≤ y whenever k ≤ a(x, y), which induces a faithful functor V-Cat → Ord. A V-category is *separated* if its natural order is antisymmetric. A V*-functor* f : (X, a) → (Y, b) is a map f : X → Y such that, for all x, y ∈ X, a(x, y) ≤ b(f(x), f(y)). V-categories and V-functors form the category V-Cat, and we denote by V-Catsym the full subcategory of V-Cat determined by the symmetric V-categories and by V-Catsym,sep the full subcategory of V-Catsym determined by the separated symmetric V-categories.

*Example 2.3.* 1. The Category 1-Cat is equivalent to the category Set of *sets* and functions.

2. The category 2-Cat is equivalent to the category Ord of *preordered sets* and monotone maps.

3. Metric, ultrametric and bounded metric spaces `a la Lawvere [26] can be seen as quantale-enriched categories:


4. Categories enriched in a free quantale PM on a monoid M can be interpreted as sets equipped with a non-deterministic M-valued structure.

We focus on V = 2 and V = [0, 1]⊕, which we will use to capture classical (qualitative) and metric (quantitative) aspects of system behaviour, respectively.. Table 1 provides some instances of generic quantale-based concepts (either introduced above or to be introduced presently) in these two cases, for further reference.


**Table 1.** V-categorical notions in the qualitative and the quantitative setting. The prefix 'pseudo' refers to absence of separatedness, and the prefix 'hemi' additionally indicates absence of symmetry.

<sup>A</sup> <sup>V</sup>-category (X, a) is *discrete* if a = 1X, and *indiscrete* if a(x, y) = - for all x, y <sup>∈</sup> X. The *dual* of (X, a) is the <sup>V</sup>-category (X, a) op = (X, a◦) given by a◦(x, y) = a(y, x). Given a set X and a *structured cone*, i.e. a family (f<sup>i</sup> : <sup>X</sup> → |(X<sup>i</sup>, a<sup>i</sup>)|)i∈<sup>I</sup> of maps into <sup>V</sup>-categories (X<sup>i</sup>, a<sup>i</sup>), the *initial structure* a: X <sup>×</sup> X → V on X is defined by a(x, y) = - <sup>i</sup>∈<sup>I</sup> <sup>a</sup><sup>i</sup>(f<sup>i</sup>(x), f<sup>i</sup>(y)), for all x, y <sup>∈</sup> <sup>X</sup>. A cone ((X, a) <sup>→</sup> (X<sup>i</sup>, a<sup>i</sup>))<sup>i</sup>∈<sup>I</sup> is said to be *initial* (w.r.t. the forgetful functor |−|: <sup>V</sup>-Cat <sup>→</sup> Set) if a is the initial structure w.r.t. the structured cone (X <sup>→</sup> <sup>|</sup>(X<sup>i</sup>, a<sup>i</sup>)|)i∈<sup>I</sup> ; a <sup>V</sup>-functor is *initial* if it forms a singleton initial cone. For every <sup>V</sup>category (X, a) and every set S, the S*-power* (X, a)<sup>S</sup> is the <sup>V</sup>-category consisting of the set of all functions from S to X, equipped with the <sup>V</sup>-category structure [−, <sup>−</sup>] given by [f,g] = - <sup>x</sup>∈<sup>X</sup> <sup>a</sup>(f(x), g(x)), for all f,g : <sup>S</sup> <sup>→</sup> <sup>X</sup>. By equipping its hom-sets with the substructure of the appropriate power, the category V-Cat becames V-Cat-enriched and, hence, also Ord-enriched w.r.t to the corresponding natural order of V-categories. We say that an endofunctor on V-Cat is *locally monotone* if it preserves this preorder.

Remark 2.4. Let us briefly outline the connections between V-Cat and V-Catsym, which for real-valued V correspond to hemimetric and pseudometric spaces, respectively. By virtue of the above construction of initial structures, the categories V-Cat and V-Catsym are topological over Set [1]; in particular, both categories are complete and cocomplete. Moreover, V-Catsym is a (reflective and) coreflective full subcategory of V-Cat. The coreflector (−)<sup>s</sup> : V-Cat → V-Catsym is identity on morphisms and sends every (X, a) to its symmetrization, the <sup>V</sup>-category (X, a<sup>s</sup>) where a<sup>s</sup>(x, y) = a(x, y) <sup>∧</sup> a(y, x) (keep in mind that in Example 2.2.3, the order is the dual of the numeric order).

Finally, we note that for every quantale <sup>V</sup>, (V, hom) is a <sup>V</sup>-category, which for simplicity we also denote by V. The following result records two fundamental properties of the V-category V.

**Proposition 2.5.** The <sup>V</sup>-category <sup>V</sup> = (V, hom) is injective w.r.t. initial morphisms, and for every <sup>V</sup>-category <sup>X</sup>, the cone (<sup>f</sup> : <sup>X</sup> → V)<sup>f</sup> is initial.

#### **2.3 Predicate Liftings**

Given a cardinal κ and a <sup>V</sup>-category X, a κ-ary X*-valued predicate lifting* for a functor <sup>F</sup>: <sup>V</sup>-Cat → V-Cat is a natural transformation λ: <sup>V</sup>-Cat(−, X<sup>κ</sup>) <sup>→</sup> V-Cat(F−, X). When V is the trivial quantale, we identify an X-valued predicate lifting with a natural transformation <sup>λ</sup>: Set(−, X<sup>κ</sup>) <sup>→</sup> Set(F−, X) via the isomorphism Set ∼= 1-Cat. In this case, we are primarily interested in predicate liftings valued in the underlying set of another quantale, and we say that such predicate liftings are *monotone* if each of its components is a monotone map w.r.t. the pointwise order induced by that quantale.

*Remark 2.6.* By the Yoneda lemma, every κ-ary X-valued predicate lifting for a functor <sup>F</sup>: <sup>V</sup>-Cat → V-Cat is determined by a <sup>V</sup>-functor <sup>F</sup>X<sup>κ</sup> <sup>→</sup> <sup>X</sup>. In particular, the collection of all X-valued κ-ary predicate liftings for a functor is a set.

*Example 2.7.* 1. The Kripke semantics of the standard diamond modality ♦ of the modal logic K is induced (in a way recalled in Section 5) by the unary predicate lifting ♦X(A) = {<sup>B</sup> <sup>⊆</sup> <sup>X</sup> <sup>|</sup> <sup>A</sup> <sup>∩</sup> <sup>B</sup> <sup>=</sup> <sup>∅</sup>} for the (finite) powerset functor (modulo the isomorphism PX ∼= Set(X, 2)).

2. Computing the expected value for a given [0, 1]-valued function with respect to each probability distribution defines a unary [0, 1]-valued predicate lifting for the functor <sup>D</sup><sup>ω</sup> : Set <sup>→</sup> Set, which we denote by <sup>E</sup>.

## **2.4 Quantale-Enriched Relations and Lax Extensions**

The structure of a quantale-enriched category is a particular kind of "enriched relation". For a quantale V and sets X and Y , a V*-relation* from X to Y is a map r : X × Y → V; we then write r : X −→ Y . As for ordinary relations, a pair of V-relations r : X −→ Y and s: Y −→ Z can be composed via "matrix multiplication": (s · r)(x, z) = - <sup>y</sup>∈<sup>Y</sup> <sup>r</sup>(x, y) <sup>⊗</sup> <sup>s</sup>(y, z) for <sup>x</sup> <sup>∈</sup> <sup>X</sup>, <sup>z</sup> <sup>∈</sup> <sup>Z</sup>. With this composition, the collection of all sets and V-relations between them form a category, denoted V-Rel. The identity morphism on a set X is the V-relation 1<sup>X</sup> : X −→ X that sends every diagonal element to k and all the others to ⊥.

*Example 2.8.* The category of 2-relations is the usual category Rel of sets and relations. Quantitative or "fuzzy" relations are usually defined as [0, 1]⊕-relations (e.g. [38,5]).

The category V-Rel comes with an involution (−) ◦ : <sup>V</sup>-Relop → V-Rel that maps objects identically and sends a V-relation r : X −→ Y to the V-relation r◦ : Y −→

X given by r◦(y, x) = r(x, y), the *converse* of r. Moreover, by equipping its hom-sets with the pointwise order induced by V, V-Rel is made into a quantaloid (e.g. [31]), i.e. enriched over complete join semilattices. This entails that there is an optimal way of extending a V-relation r : X −→ Y along a V-relation <sup>s</sup>: <sup>X</sup> −→ <sup>Z</sup>: the (Kan) *extension* of <sup>r</sup> along <sup>s</sup> is the <sup>V</sup>-relation <sup>r</sup> s: Z −→ Y defined by the property <sup>t</sup> · <sup>s</sup> <sup>≤</sup> <sup>r</sup> ⇐⇒ <sup>t</sup> <sup>≤</sup> <sup>r</sup> s, for all t: Z −→ Y .

<sup>A</sup> *lax extension* <sup>1</sup> of a functor <sup>F</sup>: Set <sup>→</sup> Set to <sup>V</sup>-Rel is a lax functor <sup>F</sup>: <sup>V</sup>-Rel → V-Rel that agrees with <sup>F</sup> on sets and whose action on functions

<sup>1</sup> Extensions of Set-functors to Rel are also commonly referred to as "relators", "relational liftings" or "lax relational liftings".

is compatible with F. To make the latter requirement precise, we note that a function is interpreted as the V-relation that sends every element of its graph to k and all the others to <sup>⊥</sup>; then, a lax extension of <sup>F</sup> to <sup>V</sup>-Rel, or simply a lax extension, is a map (r : X −→ Y ) −→ (Fr : <sup>F</sup>X −→ <sup>F</sup>Y ) such that:


for all r : X −→ Y , s: Y −→ Z and f : X <sup>→</sup> Y .

*Example 2.9.* The generalized "lower-half" Egli-Milner order between powersets, which for a relation r : X −→ Y is defined as the relation <sup>P</sup>r : <sup>P</sup><sup>X</sup> −→ <sup>P</sup><sup>Y</sup> given by

$$A(\mathsf{P}r)B \iff \forall a \in A. \exists b \in B. \, a \, r \, b,$$

defines a lax extension of the powerset functor P: Set → Set to Rel. Similarly, the generalized "upper-half" and the generalized Egli-Milner order define lax extensions of the powerset functor to Rel.

Lax extensions are deeply connected with monotone predicate liftings. To realize this, it is convenient to think of the X-component of a κ-ary predicate lifting as a map of type <sup>V</sup>-Rel(κ, X) → V-Rel(1, <sup>F</sup>X) [16]. <sup>2</sup>

**Definition 2.10.** <sup>A</sup> κ-ary predicate lifting λ for a functor <sup>F</sup>: Set <sup>→</sup> Set is *induced* by a lax extension <sup>F</sup>-: <sup>V</sup>-Rel→V-Rel if there is a <sup>V</sup>-relation <sup>r</sup>: <sup>1</sup> −→ <sup>F</sup><sup>κ</sup>

such that λ(f) = <sup>F</sup>f · <sup>r</sup>, for every <sup>V</sup>-relation f : κ −→ X.

*Example 2.11.* By interpreting a subset of a set X as a relation from 1 to X, the unary predicate lifting ♦ (see Example 2.7) for the powerset functor <sup>P</sup>: Set <sup>→</sup> Set is induced by the lax extension of Example 2.9; indeed, it is determined by the map 1 → P1 that selects the set 1.

*Remark 2.12.* Every predicate lifting induced by a lax extension is monotone.

Lax extensions have been instrumental in coalgebraic notions of *behavioural distance* (e.g. [13,38,39]), and the notion of Kantorovich extension has been crucial to connect such notions with coalgebraic modal logic [7].

**Definition 2.13.** Let <sup>F</sup>: Set <sup>→</sup> Set be a functor, and <sup>Λ</sup> <sup>a</sup> *class* of monotone predicate liftings for <sup>F</sup>. The *Kantorovich* lax extension of <sup>F</sup> w.r.t. Λ is the lax extension <sup>F</sup>-<sup>Λ</sup> = λ∈Λ <sup>F</sup><sup>λ</sup>, where for every <sup>V</sup>-relation r : X −→ Y , the <sup>V</sup>-relation F<sup>λ</sup>r : <sup>F</sup>X −→ <sup>F</sup>Y given by <sup>F</sup><sup>λ</sup>r <sup>=</sup> g : κ −→ X <sup>λ</sup>(<sup>r</sup> · <sup>g</sup>) λ(g).

<sup>2</sup> Note that Goncharov et. al. consider as their main point of view the dual of the one considered here [16, Proposition 4.2]. Our choice prevents a harmless mismatch between the Kantorovich liftings and Kantorovich extensions in Theorem 3.9.

*Example 2.14.* The Kantorovich extension of the powerset functor P: Set → Set to Rel w.r.t the ♦ predicate lifting coincides with the extension given by the "lower-half" of the Egli-Milner order (Example 2.9).

As suggested by the previous example, the Kantorovich extension leads to a representation theorem that plays an important role in Section 3.2.

**Theorem 2.15 ([16]).** *Let* <sup>F</sup>-: V*-*Rel → V*-*Rel *be a lax extension, and let* Λ *be the class of all predicate liftings induced by* <sup>F</sup>-*. Then,* <sup>F</sup>- <sup>=</sup> <sup>F</sup>-Λ*.*

# **3 Topological Liftings**

It is well-known that every lax extension <sup>F</sup>-: V-Rel → V-Rel of a functor F: Set → Set gives rise to a lifting (which we denote by the same symbol) of F to V-Cat (for instance, see [37]). By definition, liftings are completely determined by their action on objects. In particular, the *lifting induced by a lax extension* F-: <sup>V</sup>-Cat → V-Cat sends a <sup>V</sup>-category (X, a) to the <sup>V</sup>-category (FX, <sup>F</sup>a). Of course, it does not make sense to talk about functor liftings to the category V-Cat when V is trivial, hence we assume from now on that V *is non-trivial*.

Predicate liftings also induce functor liftings, via a simple construction available on all topological categories that goes back, at least, to work in categorical duality theory [11,29]: To lift a functor G: A → Y along a topological functor |−|: B → Y, it is enough to give, for every object A in A, a structured cone

$$\mathcal{C}(A) = (\mathsf{G}A \xrightarrow{h} |B|)\_{h,B} \tag{1}$$

so that, for every h in C(A) and every f : A- → A, the composite h · Gf belongs to the cone C(A- ). Then, for an object A in A, one defines G<sup>I</sup>A by equipping GA with the initial structure w.r.t. the structured cone (1). It is easy to see that the assignment <sup>X</sup> → <sup>G</sup><sup>I</sup><sup>X</sup> indeed defines a functor <sup>G</sup><sup>I</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> such that |−| · <sup>G</sup><sup>I</sup> <sup>=</sup> <sup>G</sup>. This technique has been previously applied in the context of *codensity liftings* [21,22,35,19] and *Kantorovich liftings* [5]. We apply this to our situation as follows. Given a functor F: Set → Set, take G = F · |−|; then a lifting of F to V-Cat can be specified by a *class* of natural transformations

$$\lambda \colon \mathcal{V}\text{-}\mathsf{Cat}(- , A\_{\lambda}) \longrightarrow \mathsf{Set}(\mathsf{F}[- ], |B\_{\lambda}|), \tag{2}$$

(which may be thought of as generalized predicate liftings, in that they lift Aλvalued predicates to Bλ-valued ones). Namely, given a V-category X, we consider the structured cone consisting of all maps

$$
\lambda(f) \colon \mathsf{F}|X| \longrightarrow |B\_{\lambda}|.
$$

where λ ranges over the given natural transformations and f over all V-functors <sup>X</sup> <sup>→</sup> <sup>A</sup>λ. As described above, we obtain a <sup>V</sup>-category F<sup>I</sup><sup>X</sup> by equipping F|X<sup>|</sup> with the initial structure w.r.t. this cone. We call functor liftings constructed in this way *topological*. Indeed, it turns out that *every* functor lifting is topological, even when one restricts B<sup>λ</sup> in (2) to be the V-category (V, hom):

**Theorem 3.1.** *Every lifting of a* Set*-functor to* <sup>V</sup>*-*Cat *is topological w.r.t. a* class *of natural transformations* <sup>λ</sup>: <sup>V</sup>*-*Cat(−, Aλ) −→ Set(F|−|, |V|).

In examples, we usually construct a generalized predicate lifting (2) from a κ-ary predicate lifting <sup>λ</sup> for the set functor <sup>F</sup>: Choose a pair (A, B) of <sup>V</sup>-categories over the sets <sup>V</sup><sup>κ</sup> and <sup>V</sup>, respectively (the above theorem allows restricting to <sup>B</sup> <sup>=</sup> <sup>V</sup>, and the examples we present are of this kind). We can then precompose λ with the inclusion natural transformation <sup>V</sup>-Cat(−, A) −→ Set(|−|, <sup>|</sup>A|), obtaining a natural transformation <sup>λ</sup>(A,B) : <sup>V</sup>-Cat(−, A) <sup>→</sup> Set(F|−|, <sup>|</sup>B|) that applies <sup>λ</sup> to maps underlying <sup>V</sup>-functors with codomain <sup>A</sup>.

*Example 3.2.* 1. The discrete lifting of the identity functor Id: Set <sup>→</sup> Set, which sends every V-category to the discrete V-category with the same underlying set, can be obtained as a topological lifting constructed from the identity V-valued predicate lifting for Id by choosing <sup>A</sup> to be the <sup>V</sup>-category consisting of the set <sup>V</sup> equipped with the indiscrete structure.

2. The lifting of the identity functor Id: Set → Set to Ord that computes the smallest equivalence relation that contains a given preorder can be obtained as a topological lifting constructed from the 2-valued identity predicate lifting for Id by choosing A to be the discrete preordered set with two elements.

3. It is well-known that the total variation distance between finite distributions μ, υ on a set X coincides with the Kantorovich distance on the discrete boundedby-1 metric space X (e.g. [15]); that is, dT V (μ, υ) = - <sup>f</sup> : <sup>X</sup>→[0,1] <sup>E</sup>X(f)(υ) EX(f)(μ) (see Example 2.7(2)). Therefore, the total variation distance defines a lifting of the finite distribution functor to BHMet that can be obtained as the topological lifting constructed from the predicate lifting E by choosing A to be the indiscrete space [0, 1]. This example is closely related to the first one. Indeed, this lifting is the composite of the Kantorovich lifting of the finite distribution functor to BHMet (see Example 3.5) and the discrete lifting of the identity functor to BHMet. By Theorem 3.9 below, precomposing functor liftings with the discrete lifting of the identity functor can be used to derive non-Kantorovich liftings.

*Remark 3.3.* Theorem 3.1 can be fine-tuned to show that the discrete lifting <sup>F</sup><sup>d</sup> : Ord <sup>→</sup> Ord of a finitary functor <sup>F</sup>: Set <sup>→</sup> Set is a topological lifting constructed from a set Λ of finitary 2-valued predicate liftings for F. Hence, for every set X, considered as a discrete preordered set, we have that the cone of all maps <sup>λ</sup>(f): <sup>F</sup><sup>d</sup>(X, <sup>1</sup>X) <sup>→</sup> <sup>2</sup>, for <sup>κ</sup>-ary predicate liftings <sup>λ</sup> <sup>∈</sup> <sup>Λ</sup> and maps <sup>X</sup> <sup>→</sup> <sup>2</sup><sup>κ</sup>, is initial. Thus, as Fd(X, 1X) is antisymmetric, this cone is mono. In this sense, our results subsume the result that every finitary Set-functor admits a separating set of finitary predicate liftings [34].

## **3.1 Kantorovich Liftings**

For our present purposes, we are primarily interested in topological liftings induced by predicate liftings in the standard sense, i.e. the natural transformations (2) are of the shape <sup>λ</sup>: <sup>V</sup>-Cat(−, <sup>V</sup><sup>κ</sup>) −→ Set(F|−|, |V|), and thus employ <sup>V</sup>, equipped with its standard V-category structure, as the object of truth values throughout. In particular, this format is needed to use predicate liftings as modalities in existing frameworks for quantitative coalgebraic logic (Section 5). Many functor liftings considered in work on coalgebraic behavioural distance can be understood as topological liftings constructed in this way (e.g. [5,22,38,39,12]). To simplify notation, in the sequel we often omit the forgetful functor to Set.

**Definition 3.4.** Let F: Set → Set be a functor and Λ a class of V-valued predicate liftings for F. The *Kantorovich lifting* of F w.r.t. Λ is the topological lifting <sup>F</sup><sup>Λ</sup> : <sup>V</sup>-Cat → V-Cat that sends a <sup>V</sup>-category <sup>X</sup> to the <sup>V</sup>-category (FX, <sup>F</sup><sup>Λ</sup>a), where F<sup>Λ</sup>a denotes the initial structure on FX w.r.t. the structured cone of all functions

$$
\lambda(f) \colon \mathsf{F}|X| \longrightarrow |\mathcal{V}|
$$

where <sup>λ</sup> <sup>∈</sup> <sup>Λ</sup> is <sup>κ</sup>-ary and <sup>f</sup> : (X, a) → V<sup>κ</sup> is a <sup>V</sup>-functor. Generally, a lifting <sup>F</sup>: <sup>V</sup>-Cat → V-Cat of F is *Kantorovich* if <sup>F</sup> = F<sup>Λ</sup> some class <sup>Λ</sup> of predicate liftings for F.

*Example 3.5.* As the name suggests, the prototypical example of a Kantorovich lifting is given by the (non-symmetric) Kantorovich distance between finite distributions, which arises as the Kantorovich lifting of the finite distribution functor on Set to the category BHMet w.r.t the predicate lifting E that computes expected values, i.e. D<sup>E</sup> <sup>ω</sup>(X, a)(μ, υ) = - <sup>f</sup> : (X,a)→[0,1] <sup>E</sup>X(f)(υ) <sup>E</sup>X(f)(μ).

We go on to exploit the universal property of initial lifts of cones to characterize the liftings that are Kantorovich. In the following, fix a functor F: Set → Set and a quantale V. Consider the partially ordered conglomerate Pred(F) of *classes* of Vvalued predicate liftings for F ordered by containment, i.e. Λ ≤ Λ ⇐⇒ Λ ⊇ Λ ; and the partially ordered class Lift(F) of liftings of F to V-Cat ordered pointwise, i.e. F ≤ F ⇐⇒ <sup>F</sup><sup>a</sup> <sup>≤</sup> <sup>F</sup> a, for every V-category (X, a).

**Definition 3.6.** Let F: V-Cat → V-Cat be a lifting of F. A κ-ary V-valued *predicate lifting* λ for F is *compatible with* F if it restricts to a predicate lifting for F:

$$
\begin{array}{c}
\mathcal{V}\text{-Cat}(- ,\mathcal{V}^{\kappa}) \xrightarrow{\stackrel{\lambda}{\cdot} \dots \cdot} \mathcal{V}\text{-Cat}(\overline{\mathsf{F}} - ,\mathcal{V}) \\
\downarrow \\
\mathsf{Set}(- ,|\mathcal{V}^{\kappa}|) \xrightarrow[\lambda]{} \mathsf{Set}(\mathsf{F}|- ,|\mathcal{V}|)
\end{array}
$$

where the vertical arrows denote set inclusions – that is, if λ lifts V-functorial predicates on X to V-functorial predicates on FX. The class of all predicate liftings compatible with F is denoted by P(F).

**Proposition 3.7.** *A* κ*-ary* V*-valued predicate lifting* λ *for* F *is compatible with* F *iff the map* <sup>λ</sup>(1|Vκ|): <sup>F</sup>(|V<sup>κ</sup>|) → |V| *is a* <sup>V</sup>*-functor of type* <sup>F</sup>V<sup>κ</sup> → V*.*

The Kantorovich lifting defines a universal construction:

**Theorem 3.8.** *Let* F: Set → Set *be a functor. Assigning to a class of predicate liftings for* F *the corresponding Kantorovich lifting yields a right adjoint*

<sup>F</sup>(−) : Pred(F) <sup>→</sup> Lift(F) *whose left adjoint* <sup>P</sup>: Lift(F) <sup>→</sup> Pred(F) *maps a lifting of* F *to the class* P(F) *of all* V*-valued predicate liftings for* F *that are compatible with the lifting.*

The following result shows that Kantorovich liftings are characterized by a pleasant property that is required in multiple results in the context of coalgebraic approaches to *behavioural distance* (e.g. [5,22,12,40]).

**Theorem 3.9.** *A lifting of a* Set*-functor to* V*-*Cat *is Kantorovich iff it preserves initial morphisms.*

**Corollary 3.10.** *Every topological lifting of a functor* F: Set → Set *w.r.t. a class of natural transformations* λ: V*-*Cat(−, Aλ) → Set(F−, |Bλ|) *where each* A<sup>λ</sup> *is injective in* V*-*Cat *w.r.t. initial morphisms is Kantorovich.*

**Corollary 3.11.** *The composite of Kantorovich liftings is Kantorovich.*

*Example 3.12.* The characterization of Theorem 3.9 makes it easy to distinguish Kantorovich liftings.

1. It is an elementary fact that every lifting induced by a lax extension preserves initial morphisms (e.g. [18, Proposition 2.16]). In particular, the Wasserstein lifting [5] is Kantorovich.

2. The identity functor on Set has a lifting (−)◦ : V-Cat → V-Cat that sends every V-category to its dual. Clearly, this lifting preserves initial morphisms, and hence it is Kantorovich. Indeed, one can show that it is the Kantorovich lifting of the identity functor w.r.t. the set of V-valued predicate liftings determined by the representable <sup>V</sup>-functors <sup>V</sup>op → V.

3. The functor (−)<sup>s</sup> : V-Cat → V-Catsym that symmetrizes V-categories gives rise to a lifting (−)<sup>s</sup> : V-Cat → V-Cat of the identity functor on Set. Clearly, this functor preserves initial morphisms, and hence it is Kantorovich. Indeed, one can show that it is the Kantorovich lifting of the identity functor w.r.t. the set of all V-valued predicate liftings determined by the representable V-functors V<sup>s</sup> → V.

4. The discrete lifting of the identity functor on Set to V-Cat is *not* Kantorovich, as it fails to preserve initial morphisms.

5. The lifting of the identity functor on Set to V-Cat that sends a V-category (X, a) to the V-category given by the final structure w.r.t. the structured cospan of identity maps |(X, a)| → X ← |(X, a◦)| is *not* Kantorovich. This lifting generalizes Example 3.2(2).

6. The lifting of the finite distribution functor on Set to BHMet given by the Kantorovich distance is Kantorovich, while the lifting given by the total variation distance is *not* Kantorovich.

#### **3.2 Liftings Induced by Lax Extensions**

We show next that lax extensions, functor liftings, and predicate liftings are linked by adjunctions, and characterize the liftings induced by lax extensions. We begin by showing that the Kantorovich extension and the Kantorovich lifting are compatible.

**Theorem 3.13.** *Let* <sup>F</sup>-: V*-*Cat → V*-*Cat *be a lifting of a functor* F: Set → Set *induced by a lax extension* <sup>F</sup>-: <sup>V</sup>*-*Rel → V*-*Rel*. If* <sup>F</sup>-: V*-*Rel → V*-*Rel *is the Kantorovich extension w.r.t. a* class Λ *of predicate liftings, then the functor* F-: V*-*Cat → V*-*Cat *is the Kantorovich lifting of* F: Set → Set *w.r.t.* Λ*.*

Let Lax(F) denote the partially ordered class of lax extensions of a functor F: Set → Set to V-Rel ordered pointwise:

$$
\widehat{\mathsf{F}} \le \widehat{\mathsf{F}}' \iff \forall r \in \mathcal{V}
\text{-}\text{Rel.}\,\widehat{\mathsf{F}}r \le \widehat{\mathsf{F}}'r;
$$

let Lift(F)<sup>I</sup> denote the partially ordered *subclass* of Lift(F) consisting of the liftings that preserve initial morphisms, and let Pred(F)<sup>M</sup> denote the partially ordered *subconglomerate* of Pred(F) of monotone predicate liftings. Clearly, the operations of taking Kantorovich extensions <sup>F</sup>-(−) : Pred(F)<sup>M</sup> <sup>→</sup> Lax(F), and inducing liftings from lax extensions I: Lax(F) → Lift(F)<sup>I</sup> define monotone maps. Moreover, as we have seen in Theorem 3.9, the monotone map <sup>F</sup>(−) : Pred(F) <sup>→</sup> Lift(F) corestricts to Lift(F)I. Therefore, our results so far tell us that lax extensions, liftings and predicate liftings are connected through a diagram of monotone maps

$$\begin{array}{c} \mathsf{Lax}(\mathsf{F}) \stackrel{\mathsf{l}}{\longrightarrow} \mathsf{Lift}(\mathsf{F})\_{\mathsf{l}} \\\\ \mathsf{Pred}(\mathsf{F})\_{\mathsf{M}} \stackrel{\mathsf{l}}{\longrightarrow} \mathsf{Pred}(\mathsf{F})\_{\mathsf{l}} \end{array}$$

which commutes if the left adjoint is ignored. In the sequel, we will see that every monotone map in this diagram is an adjoint. In particular, it might not be immediately obvious that the monotone map <sup>F</sup>-(−) : Pred(F)<sup>M</sup> <sup>→</sup> Lax(F) is a right adjoint without first thinking in terms of functor liftings induced by lax extensions, because the obvious guess – taking the predicate liftings induced by a lax extension (Definition 2.10) – in general does not define a monotone map Lax(F) → Pred(F)M. The next example illustrates this as well as the fact that there are predicate liftings compatible with a functor lifting induced by a lax extension that are not induced by the lax extension.

*Example 3.14.* The identity functor on Ord is the lifting induced by the identity functor on Rel as a lax extension of the identity functor on Set. The constant map into is a monotone map 2 → 2 and, hence, determines a predicate lifting that is compatible with the identity functor on Ord. It is easy to see that this predicate lifting is induced by the largest extension of the identity functor, however, it is not induced by the identity functor on Rel [16, Example 3.12].

It should also be noted that the predicate liftings compatible with a functor lifting that preserves initial morphisms are not necessarily monotone. That is, the map P: Lift(F)<sup>I</sup> → Pred(F) does not necessarily corestrict to Pred(F)M.

*Example 3.15.* Consider the lifting (−) ◦ : Ord → Ord of the identity functor on Set that sends each preordered set to its dual. Then, the predicate lifting for (−) ◦ determined by the <sup>V</sup>-functor hom(−, 0): (2, hom)op <sup>→</sup> (2, hom) is not monotone since it sends the constant map 0: 1 → 2 to the constant map 1: 1 → 2.

Accordingly, we need to "filter the monotone predicate liftings" first. This operation trivially defines the left adjoint M: Pred(F) → Pred(F)<sup>M</sup> of the inclusion map Pred(F)<sup>M</sup> -→ Pred(F).

**Theorem 3.16.** Let F: Set → Set be a functor. The monotone map I: Lax(F) → Lift(F)<sup>I</sup> is order-reflecting and right adjoint to the monotone map F-MP(−) : Lift(F)<sup>I</sup> <sup>→</sup> Lax(F).

**Corollary 3.17.** Let F: Set → Set be a functor. The monotone map F-(−) : Pred(F)<sup>M</sup> <sup>→</sup> Lax(F) is right adjoint to the order-reflecting monotone map MPI: Lax(F) → Pred(F)M.

Therefore, the interplay between lax extensions, liftings and predicate liftings is captured by the diagram -

$$\mathsf{Lax}(\mathsf{F}) \xrightarrow[\mathsf{I}]{\widehat{\mathsf{F}}^{\mathsf{MP}(-)}} \mathsf{L} \mathsf{lift}(\mathsf{F})\_{\mathsf{I}}$$

$$\mathsf{LMP}\Big(\mathop{\dashv}\limits\_{\mathsf{I}}^{\mathsf{P}\mathsf{MP}(-)} \mathop{\dashv}\limits\_{\mathsf{F}^{(-)}} \Big(\mathop{\dashv}\limits\_{\mathsf{I}}^{\mathsf{P}}\Big)\_{\mathsf{I}}\Big)\_{\mathsf{I}}\tag{3}$$

$$\mathsf{Pred}(\mathsf{F})\_{\mathsf{M}} \xleftarrow{\top} \mathsf{Pred}(\mathsf{F})$$

which commutes when only the right adjoints or only the left adjoints are considered. Finally, we characterize the liftings induced by lax extensions.

**Theorem 3.18.** A lifting <sup>F</sup> of a Set-functor F to V-Cat is induced by a lax extension of <sup>F</sup> to <sup>V</sup>-Rel iff <sup>F</sup>preserves initial morphisms and is locally monotone.

V-enriched lax extensions have proved to be crucial to deduce quantitative van Benthem and Hennessy-Milner theorems [38,39]. We recall that a lax extension of a functor <sup>F</sup>: Set <sup>→</sup> Set to <sup>V</sup>-Rel is <sup>V</sup>*-enriched* [39,16] if, for all <sup>u</sup> ∈ V, <sup>u</sup> <sup>⊗</sup> <sup>1</sup><sup>F</sup><sup>X</sup> <sup>≤</sup> <sup>F</sup>-(u ⊗ 1X); where u ⊗ r denotes the V-relation "r scaled by u", that is, (u ⊗ r)(x, y) = u ⊗ r(x, y).

**Theorem 3.19.** A lifting <sup>F</sup> of a Set-functor F to V-Cat is induced by a Venriched lax extension of <sup>F</sup> to <sup>V</sup>-Rel iff <sup>F</sup> preserves initial morphisms and is V-Cat-enriched.

Our characterization of lax extensions makes it clear that there is a large collection of Kantorovich liftings that are not induced by lax extensions. For instance, it follows from Theorem 3.18 that the liftings (−)◦ : V-Cat → V-Cat and (−)<sup>s</sup> : V-Cat → V-Cat (see Example 3.12) of the identity functor on Set to V-Cat are Kantorovich but are not induced by lax extensions. Furthermore, as the composite of Kantorovich liftings is Kantorovich, in many situations it is possible to compose these functors with other Kantorovich liftings to generate liftings that are not induced by lax extensions.

## **4 Behavioural Distance**

One main motivation for lifting functors to metric spaces was to obtain coalgebraic notions of behavioural distance [5,38]. Indeed, every functor F: V-Cat → V-Cat gives rise to a notion of distance on a F-coalgebras:

**Definition 4.1.** [12] Let (X, a, α) be a coalgebra for a functor F: V-Cat → V-Cat. The *behavioural distance* bd<sup>F</sup> <sup>α</sup>(x, y) of x, y ∈ X is

$$\text{bd}\_{\alpha}^{\mathsf{F}}(x, y) = \bigvee \{ b(f(x), f(y)) \mid f \colon (X, a, \alpha) \to (Y, b, \beta) \in \mathsf{CoAlg}(\mathsf{F}) \}. \tag{4}$$

Notice the analogy with the standard notion of behavioural equivalence: Two states are behaviourally equivalent if they can be made equal under some coalgebra morphism; and according to the above definition, two states in a metric coalgebra have low behavioural distance if they can be made to have low distance under some coalgebra morphism.

Kantorovich liftings and lax extensions are key ingredients in mentioned alternative coalgebraic approaches to behavioural distance on Set-based coalgebras. Let <sup>F</sup>: Set <sup>→</sup> Set be a functor. A Kantorovich lifting <sup>F</sup><sup>Λ</sup> : <sup>V</sup>-Cat → V-Cat induces a notion of behavioural distance on an F-coalgebra α: X → FX as the greatest <sup>V</sup>-categorical structure (X, a) that makes <sup>α</sup> <sup>a</sup> <sup>V</sup>-functor of type (X, a) <sup>→</sup> <sup>F</sup>Λ(X, a) [5,22]. From Theorem 3.9 and [12, Proposition 12] (generalized to V-Cat, with the same proof), we obtain that this distance coincides with behavioural distance as defined above. On the other hand, every lax extension <sup>F</sup>: <sup>V</sup>-Rel → V-Rel of <sup>F</sup> also induces a behavioural distance on an F-coalgebra α: X → FX as the greatest simulation on α [32,40,13,38], i.e. the greatest V-relation s: X −→ X such that

<sup>α</sup> · <sup>s</sup> <sup>≤</sup> <sup>F</sup><sup>s</sup> · <sup>α</sup>. It follows by routine calculation that this distance coincides with the distance defined via the lifting induced by the lax extension and, hence, Theorem 3.13 ensures that, if we start with a collection of monotone predicate liftings, then the corresponding Kantorovich extension and Kantorovich lifting yield the same notion of behavioural distance. This allows including the approach to behavioural distance via lax extensions in the categorical framework for indistinguishability introduced recently by Komorida et al. [22]. On the other hand, there are notions of behavioural distance defined via Kantorovich liftings that do not arise via lax extensions. Indeed, it has been shown that the neighbourhood functor N: Set → Set does not admit a lax extension to Rel that preserves converses (F(r◦)=(Fr)◦) whose (2-valued) notion of behavioural distance coincides with behavioural equivalence [27, Theorem 12]. However, from [12, Theorem 34, Proposition A.6] (see also [17]), we can conclude that the (2-valued) notion of behavioural distance defined by the canonical Kantorovich lifting of N to Equ w.r.t. to the predicate lifting induced by the identity natural transformation N → N coincides with behavioural equivalence. (It is easy to see that Marti and Venema's result holds even if one allows lax extensions of N that do not preserve converses, and that the situation remains the same in the asymmetric case.)

## **5 Expressivity of Quantitative Coalgebraic Logics**

We proceed to connect the characterization of Kantorovich functors with existing expressivity results for quantitative coalgebraic logic, focusing from now on on symmetric V-categories. Therefore, we interpret the V-categorical notions and results also with V-Catsym instead of V-Cat and V<sup>s</sup> instead of V.

We recall a variant [12] of (quantitative) coalgebraic logic [28,34,7,23,38] that follows the paradigm of interpreting modalities via predicate liftings, in this case of <sup>V</sup>-valued predicates for a <sup>V</sup>-Cat-functor (Section 2.3). Let <sup>Λ</sup> be a *set* of finitary predicate liftings for a functor F: V-Catsym → V-Catsym. The syntax of *quantitative coalgebraic modal logic* is then defined by the grammar

$$\phi ::= \top \mid \phi\_1 \lor \phi\_2 \mid \phi\_1 \land \phi\_2 \mid u \otimes \phi \mid \text{hom}\_s(u, \phi) \mid \lambda(\phi\_1, \dots, \phi\_n) \quad (u \in \mathcal{V}, \lambda \in \Lambda)$$

where Λ is a set of *modalities* of finite arity, which we identify, by abuse of notation, with the given set Λ of predicate liftings. We view all other connectives as propositional operators. Let <sup>L</sup>(Λ) be the set of modal formulas thus defined.

The semantics is given by assigning to each formula <sup>φ</sup> ∈ L(Λ) and each coalgebra <sup>α</sup>: <sup>X</sup> <sup>→</sup> <sup>F</sup><sup>X</sup> the *interpretation* of <sup>φ</sup> over <sup>α</sup>, i.e. the <sup>V</sup>-functor <sup>φ</sup><sup>α</sup> : <sup>X</sup> → V recursively defined as follows:

**–** for <sup>φ</sup> <sup>=</sup> , we take <sup>α</sup> to be the <sup>V</sup>-functor given by the constant map into ;

**–** for an <sup>n</sup>-ary propositional operator <sup>p</sup>, we put <sup>p</sup>(φ1,...,φn)<sup>α</sup> <sup>=</sup> p(<sup>φ</sup><sup>1</sup><sup>α</sup>,..., <sup>φ</sup><sup>n</sup><sup>α</sup>), with <sup>p</sup> interpreted using the lattice structure of <sup>V</sup> and the V-categorical structure hom<sup>s</sup> of Vs, respectively, on the right-hand side;

**–** for <sup>n</sup>-ary <sup>λ</sup> <sup>∈</sup> <sup>Λ</sup>, we put <sup>λ</sup>(φ1,...,φn)<sup>α</sup> <sup>=</sup> <sup>λ</sup>(<sup>φ</sup><sup>1</sup><sup>α</sup>,..., <sup>φ</sup><sup>n</sup><sup>α</sup> )·α, where <sup>φ</sup><sup>1</sup><sup>α</sup>,..., <sup>φ</sup><sup>n</sup><sup>α</sup> denotes the <sup>V</sup>-functor (X, a) → V<sup>n</sup> canonically determined by <sup>φ</sup><sup>1</sup><sup>α</sup>,..., <sup>φ</sup><sup>n</sup><sup>α</sup>.

We then obtain a notion of logical distance:

**Definition 5.1.** Let <sup>Λ</sup> be a set of predicate liftings for a functor <sup>F</sup>: <sup>V</sup>-Cat <sup>→</sup> <sup>V</sup>-Cat. The *logical distance* ld<sup>Λ</sup> <sup>α</sup> on an <sup>F</sup>-coalgebra (X, a, α) is the initial structure on <sup>X</sup> w.r.t. the structured cone of all maps <sup>φ</sup><sup>α</sup> : <sup>X</sup> → |(V, homs)<sup>|</sup> with <sup>φ</sup> ∈ L(Λ). More explicitly, for all x, y <sup>∈</sup> <sup>X</sup>,

$$ld^{\Lambda}\_{\alpha}(x,y) = \bigwedge \{ \operatorname{hom}\_{s}([\![\phi]\!]\_{\alpha}(x), [\![\phi]\!]\_{\alpha}(y)) \mid \phi \in \mathcal{L}(A) \}.$$

In the remainder of the paper, we establish criteria under which a V-Catsymfunctor admits a set of predicate liftings for which logical and behavioural distances coincide. Recall that a (quantitative) coalgebraic logic is *expressive* if ld<sup>Λ</sup> <sup>α</sup> <sup>≤</sup> bd<sup>F</sup> <sup>α</sup>, for every F-coalgebra (X, α). (It is easy to show that the reverse inequality holds universally [12, Theorem 16]).

Existing expressivity results for quantitative coalgebraic logics for Set-functors depend crucially on Kantorovich liftings (e.g. [38,39,22,12]). However, it has been shown [12] that the Kantorovich property can be usefully detached from the notion of functor lifting.

**Definition 5.2.** Let <sup>Λ</sup> be a class of predicate liftings for a functor <sup>F</sup>: <sup>V</sup>-Cat <sup>→</sup> <sup>V</sup>-Cat. The functor <sup>F</sup> is <sup>Λ</sup>*-Kantorovich* if for every <sup>V</sup>-category <sup>X</sup>, the cone of all <sup>V</sup>-functors <sup>λ</sup>(f): <sup>F</sup><sup>X</sup> → V, with <sup>λ</sup> <sup>∈</sup> Λ κ-ary and <sup>f</sup> ∈ V-Cat(X, <sup>V</sup><sup>κ</sup>), is initial. A functor <sup>F</sup>: <sup>V</sup>-Cat → V-Cat is said to be *Kantorovich* if it is <sup>Λ</sup>-Kantorovich for some class Λ of predicate liftings for F.

Clearly, every Kantorovich lifting of a Set-functor to <sup>V</sup>-Cat w.r.t. a class <sup>Λ</sup> of predicate liftings is Λ-Kantorovich. Moreover, Theorem 3.9 is easily generalized to Kantorovich functors.

**Theorem 5.3.** *A* V*-*Cat*-functor is Kantorovich iff it preserves initial morphisms.*

**Theorem 5.4.** *A* V*-*Catsym*-functor is Kantorovich iff it preserves initial morphisms.*

*Example 5.5.* 1. The inclusion functor <sup>V</sup>-Catsym,sep → V-Catsym has a left adjoint (−)<sup>q</sup> : <sup>V</sup>-Catsym → V-Catsym,sep that quotients every <sup>X</sup> by its natural preorder, which for symmetric X is an equivalence, and gives rise to a Kantorovich functor on V-Catsym.

2. Given a bounded-by-1 pseudometric space (X, d), i.e. an object of [0, 1]⊕-Catsym BPMet, the *Prokhorov distance* [30] for probability measures on the measurable space of Borel sets of (X, d) is defined by d<sup>P</sup> (μ, υ) = inf{ > <sup>0</sup> <sup>|</sup> <sup>μ</sup>(A) <sup>≤</sup> <sup>υ</sup>(A) + for all Borel sets <sup>A</sup> <sup>⊆</sup> <sup>X</sup>}, where <sup>A</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>X</sup> <sup>|</sup> inf<sup>y</sup>∈<sup>A</sup> <sup>d</sup>(x, y) <sup>≤</sup> }. It is straightforward to verify that this distance defines a BPMet-functor (which acts on morphisms by measuring preimages) that preserves isometries and, therefore, it is Kantorovich.

3. For every <sup>V</sup>-category (X, a), the functor (X, a) × −: <sup>V</sup>-Cat → V-Cat is Kantorovich. If the underlying lattice of V is Heyting, then under certain conditions this functor has a right adjoint [8,9] which is Kantorovich as well. Here, for <sup>X</sup> = (X, a) exponentiable, the right adjoint (−)<sup>X</sup> of <sup>X</sup> × − sends <sup>a</sup> <sup>V</sup>-category <sup>Y</sup> = (Y, b) to the <sup>V</sup>-category <sup>Y</sup> <sup>X</sup> = (<sup>Y</sup> <sup>X</sup>, c) with underlying set {all <sup>V</sup>-functors (1, k) <sup>×</sup> (X, a) <sup>→</sup> (Y, b)} and, for h, k <sup>∈</sup> <sup>Y</sup> <sup>X</sup>,

$$c(h,k) = \bigwedge\_{x\_1, x\_2 \in X} b(h(x\_1), k(x\_2))^{a(x\_1, x\_2)},$$

where (−)<sup>u</sup> : V→V denotes the right adjoint of <sup>u</sup> ∧ −: V→V. For a <sup>V</sup>-functor <sup>f</sup> : (Y1, b1) <sup>→</sup> (Y2, b2), the <sup>V</sup>-functor <sup>f</sup><sup>X</sup> : (<sup>Y</sup> <sup>X</sup> <sup>1</sup> , c1) <sup>→</sup> (<sup>Y</sup> <sup>X</sup> <sup>2</sup> , c2) sends <sup>h</sup> <sup>∈</sup> <sup>Y</sup> <sup>X</sup> 1 to <sup>f</sup> · <sup>h</sup>.

To ensure that a Kantorovich functor is represented by finitary predicate liftings, we need to impose a size constraint:

**Definition 5.6.** A functor <sup>F</sup>: <sup>V</sup>-Catsym → V-Catsym is <sup>ω</sup>*-bounded* if for every symmetric <sup>V</sup>-category <sup>X</sup> and every <sup>t</sup> <sup>∈</sup> <sup>F</sup>X, there exists a finite subcategory <sup>X</sup><sup>0</sup> <sup>⊆</sup> <sup>X</sup> and <sup>t</sup> <sup>∈</sup> <sup>F</sup>X<sup>0</sup> such that <sup>t</sup> <sup>=</sup> <sup>F</sup>i(<sup>t</sup> ) where <sup>i</sup> is the inclusion <sup>X</sup><sup>0</sup> <sup>→</sup> <sup>X</sup>.

*Example 5.7.* Every lifting of a finitary Set-functor to <sup>V</sup>-Catsym is <sup>ω</sup>-bounded.

**Proposition 5.8.** *Let* F: V*-*Catsym → V*-*Catsym *be a Kantorovich functor. If* F *is* ω*-bounded, then* F *is Kantorovich w.r.t. a set of finitary predicate liftings.*

Finally, from [12, Theorem 31] we obtain:

**Corollary 5.9.** *Let* V *be a finite quantale, and let* F: V*-*Catsym → V*-*Catsym *be a lifting of a finitary functor that preserves initial morphisms. Then there is a set* Λ *of predicate liftings for* F *of finite arity such that the coalgebraic logic* L(Λ) *is expressive.*

**Corollary 5.10.** *Let* F: BPMet → BPMet *be a functor that preserves isometries, is locally non-expansive, and admits a dense* ω*-bounded subfunctor. Then there is a set* Λ *of predicate liftings for* F *of finite arity such that the coalgebraic logic* L(Λ) *is expressive.*

These instantiate to results on concrete system types, e.g. ones induced by (sub)functors listed in Example 5.5, such as probabilistic transition systems equipped with a behavioural distance induced by the functor that sends a bounded metric space X to the subspace of the space of all probability measures on X equipped with the Prokhorov distance (see Example 5.5(2)) determined by the closure of the set of finitely supported probability measures.

## **6 Conclusions and Future Work**

Quantitative coalgebraic Hennessy-Milner theorems [23,38,12] assume that the functor (on metric spaces) describing the system type is *Kantorovich*, i.e. canonically induced by a suitable choice of – not necessarily monotone – predicate liftings, which then serve as the modalities of a logic that characterizes behavioural distance. We have shown as one of our main results that a functor on (quantale-valued) metric spaces is Kantorovich iff it preserves initial morphisms (i.e. isometries). As soon as such a functor additionally adheres to the expected size and continuity constraints (which replace the condition of finite branching found in the classical Hennessy-Milner theorem for labelled transition systems), one thus has a logical characterization of behavioural distance in coalgebras for the functor, in the sense that behavioural distance equals logical distance.

In fact we have shown that *every* functor on metric spaces can be captured by a generalized form of predicate liftings where the object of truth values may change along the lifting. A simple example is the discretization functor, which is characterized by a predicate lifting in which the truth value object for the input predicates is equipped with the indiscrete pseudometric, so that the lifting accepts *all* predicates instead of only non-expansive ones. This hints at a perspective to design heterogeneous modal logics that characterize behavioural distance for such functors, with modalities connecting different types of formulas (e.g. non-expansive vs. unrestricted), which we will pursue in future work. One application scenario for such a logic are behavioural distances on probabilistic systems involving total variation distance, which may be seen as a composite of the usual probabilistic Kantorovich functor and the discretization functor.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## A Logical Framework with Higher-Order Rational (Circular) Terms

Zhibo Chen() and Frank Pfenning

Carnegie Mellon University, Pittsburgh, PA, USA zhiboc@andrew.cmu.edu, fp@cs.cmu.edu

Abstract. Logical frameworks provide natural and direct ways of specifying and reasoning within deductive systems. The logical framework LF and subsequent developments focus on finitary proof systems, making the formalization of circular proof systems in such logical frameworks a cumbersome and awkward task. To address this issue, we propose CoLF, a conservative extension of LF with higher-order rational terms and mixed inductive and coinductive definitions. In this framework, two terms are equal if they unfold to the same infinite regular Böhm tree. Both term equality and type checking are decidable in CoLF. We illustrate the elegance and expressive power of the framework with several small case studies.

Keywords: Logical Frameworks, Circular Proofs, Regular Böhm Trees

## 1 Introduction

A logical framework provides a uniform way of formalizing and mechanically checking derivations for a variety of deductive systems common in the definitions of logics and programming languages. In this paper we propose a conservative extension of the logical framework LF [18] to support direct representations of rational (circular) terms and deductions.

The main methodology of a logical framework is to establish a bijective correspondence between derivations of a judgment in the object logic and canonical terms of a type in the framework. In this way, proof checking in the object logic is reduced to type checking in the framework. One notable feature of LF is the use of abstract binding trees, where substitution in the object logic can be encoded as substitution in the framework, leading to elegant encodings. On the other hand, encodings of rational terms, circular derivations, and their equality relations are rather cumbersome. We therefore propose the logical framework CoLF as a conservative extension of LF in which both circular syntactic objects and derivations in an object logic can be elegantly represented as higher-order rational dependently typed terms. This makes CoLF a uniform framework for formalizing proof systems on cyclic structures. We prove the decidability of type checking and soundness of equality checking of higher-order rational terms.

While CoLF allows formalization of circular derivations, proofs by coinduction about such circular encodings can only be represented as relations in CoLF, mirroring a similar limitation of LF regarding induction. In future work, we plan to extend CoLF to support checking of meta-theoretic properties of encodings analogous to the way Twelf [27] can check properties of encodings in LF.

The main contributions of this paper are:


An extended version of this paper, available at https://arxiv.org/abs/ 2210.06663, has an appendix that contains additional materials. We have implemented CoLF in OCaml and the implementation can be accessed at https: //www.andrew.cmu.edu/user/zhiboc/colf.html. An additional case study of the meta-encoding the term model of CoLF in CoLF is presented in Appendix J of the extended version.

## 2 Mixed Inductive and Coinductive Definitions

We motivate our design through simple examples of natural numbers, conatural numbers, and finitely padded streams. The examples serve to illustrate the idea of coinductive interpretations, and they do not involve dependent types or higherorder terms. More complex examples will be introduced later in the case studies (Section 4).

Natural Numbers. The set of natural numbers is inductively generated by zero and successor. In a logical framework such as LF, one would encode natural numbers as the signature consisting of the first three lines in the top left part of Fig. 1.

The type theory ensures that canonical terms of the type nat are in one-toone correspondence with the natural numbers. Specifically the infinite stack of successors succ (succ (succ ...)) is not a valid term of type nat. Therefore, the circular term w1 is not a valid term.

Conatural Numbers. We may naturally specify that a type admits a coinductive interpretation by introducing a new syntactic kind cotype. The kind cotype behaves just like the kind type except that now the terms under cotype 70 Z. Chen and F. Pfenning

```
nat : type.
zero : nat.
succ : nat -> nat.
w1 : nat = succ w1. (not valid)
conat : cotype.
cozero : conat.
cosucc : conat -> conat.
w2 : conat = cosucc w2.
w3 : conat = cosucc (cosucc w3).
eq : conat -> conat -> type.
eq/refl : eq N N.
eqw2w3 : eq w2 w3 = eq/refl.
                                  padding : type.
                                  pstream : cotype.
                                  cocons : nat -> padding -> pstream.
                                  pad : padding -> padding.
                                  next : pstream -> padding.
                                  s1 : pstream = cocons (succ zero)
                                              (pad (pad (next s1))).
                                  p2 : padding = pad p2. (not valid)
                                  s3 : pstream = cocons zero (next s3).
                                  s4 : pstream = cocons zero p5.
                                  p5 : padding = next s4.
                                  p6 : padding = pad p7. (not valid)
                                  p7 : padding = pad p6. (not valid)
```
Fig. 1. Signatures and Examples for Section 2

are allowed to be circular. A slightly adapted signature would encode the set of conatural numbers, shown as the first three lines in the bottom left part of Fig. 1.

Because conat is a coinductive type, the canonical forms of type conat includes cosucc<sup>n</sup> cozero for all n and the infinite stack of cosucc, which is in one to one correspondence with the set of conatural numbers. Specifically, the infinite stack of cosucc, may be represented by the valid circular term w2 as in Fig. 1. The equality of terms in CoLF is the equality of the infinite trees generated by unfolding the terms, which corresponds to a bisimulation between circular terms. For example, an alternative representation of the infinite stack of cosucc is the term w3, and CoLF will treat w2 and w3 as equal terms, as shown by the last three lines in the bottom left part of Fig. 1. The terms w2 and w3 are proved equal by reflexivity. On the other hand, a formulation of conats in LF would involve an explicit constructor, e.g. mu : (conat -> conat) -> conat. The encoding of equality is now complicated and one needs to work with an explicit equality judgment whenever a conat is used. Functions defined by coinduction (e.g., bisimulation in Appendix K of the extended version) need to be encoded as relations in CoLF.

## 2.1 Finitely Padded Rational Streams

As an example of mixed inductive and coinductive definition, we consider rational streams of natural numbers with finite paddings in between. These streams are special instances of left-fair streams [5]. We define streams coinductively and define paddings inductively, such that there are infinitely many numbers in the stream but only finitely many paddings between numbers, shown in the signature consisting of first five lines in the right column of Fig. 1. For example, the term s1 in Fig. 1 represents a stream of natural number 1's with two paddings in between. Because padding is a type, the term p2 is not valid, as it is essentially an infinite stack of pad constructors. Definitions in a CoLF signature can refer to each other. Thus, the terms s3 and s4 denote the same padded stream, and the terms p6, p7 and p2 denote the same invalid stream consisting of purely paddings.

Priorities. To ensure the adequacy of representation, types of kind cotype admit circular terms while types of kind type admit only finitary terms. It is obvious that the circular term w1 is not a valid term of type nat due to the presence of an infinite stack of inductive constructors, and the circular term w2 is a valid term of type conat because it is a stack of coinductive constructors. However, when we have both inductive and coinductive types, it is unclear whether a circular term (e.g. s1) is valid. Historically, priorities are used to resolve this ambiguity [11]. A priority is assigned to each inductive or coinductive type, and constructors inherit priorities from their types. Constructors with the highest priority types are then viewed as primary. In CoLF, priorities are determined by the order of their declarations. Type families declared later have higher priorities than those declared earlier. In this way, the type pstream has higher priority than the type padding. Constructor cocons inherits the priority of pstream, and the term s1 is viewed as an infinite stack of cocons and is thus valid. Similarly, terms s3 and s4 are also valid. If we switch the order of declaration of padding and pstream (thereby switching their priorities), then terms s1, s3, and s4 are no longer valid.

## 3 The Type Theory

We formulate the type theory of CoLF, a dependent type theory with higherorder rational terms and decidable type checking. The higher-order rational terms correspond to ⊥-free regular Böhm trees [21] and have decidable equality.

#### 3.1 Higher-Order Rational Terms

When we consider first order terms (terms without λ-binders), the rational terms are terms with only finitely many distinct subterms, and thus their equality is decidable. We translate this intuition to the higher-order setting. The higherorder rational terms are those with finitely many subterms up to renaming of free and bound variables. We give several examples of rational and non-rational terms using the signatures in Section 2.


In the definitions above, bolded symbols on the left of the equality signs are called recursion constants. It is crucial that in higher-order rational terms, all arguments to recursion constants are bound variables and not other kinds of terms. We call this restriction the prepattern restriction as it is similar to Miller's pattern restriction [24] except that we allow repetition of arguments. The prepattern restriction marks the key difference between the higher-order rational term **R<sup>2</sup>** and the infinitary term **up**. The term **up** is not rational because the argument to **up** is succ x, which is not a bound variable.

#### 3.2 Syntax

We build subsequent developments on canonical LF [19], a formulation of the LF type theory where terms are always in their canonical form. Canonical forms do not contain β-redexes and are fully η-expanded with respect to their typing, supporting bijective correspondences between object logic derivations and the terms of the framework. One drawback of this presentation is that canonical terms are not closed under syntactic substitutions, and the technique of hereditary substitution addresses this problem [29].

The syntax of the theory follows the grammar shown in Fig. 2. We use the standard notion of spines. For example, a term x M<sup>1</sup> <sup>M</sup><sup>2</sup> <sup>M</sup><sup>3</sup> will be written as <sup>x</sup> · (M1; <sup>M</sup>2; <sup>M</sup>3) where <sup>x</sup> is the head and <sup>M</sup>1; <sup>M</sup>2; <sup>M</sup><sup>3</sup> is the spine. To express rational terms, we add recursive definitions of the form r : A <sup>=</sup> M to the signature, where M must be contractive (judgment M contra) in that the head of M must be a constant or a variable. Recursive definitions look like notational definitions [26], but their semantics are very different. Recursive definitions are interpreted recursively in that the definition M may mention the recursion constant r, and other recursion constants including those defined later in the signature, while notational definitions in LF [26] cannot be recursive. Recursion constants are treated specially as a syntactic entity that is different from variables or constructors (nonrecursive constants). To ensure the conservativity over LF, we further require all definitions in Σ to be linearly ordered. That is, only in the body of a recursive definition can we "forward reference", and we can only forward reference other recursion constants. All other declarations must strictly refer to names that have been defined previously. We write λx and M to mean a sequence of λ-abstractions and a sequence of terms respectively. We write x, y, z for variables, c, d for term constants (also called constructors), a for type family constants, and r, r- , r-for recursion constants.

To enforce the prepattern restriction, we use a technical device called prepattern Π-abstractions, and associated notion of prepattern variables and prepattern spines. Prepattern <sup>Π</sup>-abstractions are written as Πx ˆ: <sup>A</sup><sup>2</sup>. A<sup>1</sup>, and x will be a prepattern variable (written x ˆ: A<sup>2</sup>) in <sup>A</sup><sup>1</sup>. Moreover, in <sup>A</sup><sup>1</sup>, if <sup>y</sup> is a variable of a prepattern type Πw ˆ: A<sup>2</sup>.B, then the prepattern application of <sup>y</sup> to <sup>x</sup> will be realized as the head y followed by a prepattern spine ([x]), written y · ([x]). The semantics is that prepattern variables may only be substituted by other prepattern variables, while ordinary variables can be substituted by arbitrary terms (which include other prepattern variables). In a well-typed signature, if Signatures Σ ::= · | Σ, a : K | Σ, c : A | Σ,r : A = M Contexts Γ ::= · | Γ, x : A | Γ, x ˆ: A Kinds K ::= type | cotype | Πx : A. K | Πx ˆ: A. K Canonical types A, B ::= P | Πx : A2. A<sup>1</sup> | Πx ˆ: A2. A<sup>1</sup> Atomic types P ::= a · S Canonical terms M ::= R | λx. M Neutral terms R ::= H · S Heads H ::= x | c | r Spines S ::= M; S | [x]; S | ()

Fig. 2. The Syntax for CoLF

r : A = M is a recursion declaration, then A consists of purely prepattern Πabstractions (judgment A prepat) and for all r · S in the signature, S consists of purely prepattern applications and is thus called a prepattern spine (judgment S prepat). The prepattern variables are similar to those introduced by the ∇-operator [25], which models the concept of fresh names, but here in a dependently typed setting, types may depend on prepattern variables.

In an actual implementation, the usages of prepattern types may impose additional burdens on the programmer. As a remedy, the implementation could infer which variables are prepattern variables based on whether they appear as arguments to recursion constants and propagate such information.

#### 3.3 Trace Condition

In a signature Σ, we say that a type A is inductive if A = Πx<sup>1</sup> ...Πx<sup>n</sup> : An.a·S and a : Πy<sup>1</sup> ...Πy<sup>m</sup> : Bm. type, and a type A coinductive if A = Πx<sup>1</sup> ...Πx<sup>n</sup> : An.a ·S and a : Πy<sup>1</sup> ...Πy<sup>m</sup> : Bm. cotype. A constructor c is inductive if c : A ∈ Σ and A is inductive, and c is coinductive if c : A ∈ Σ and A is coinductive.

The validity of the terms is enforced through a trace condition [17,8] on cycles. A trace is a sequence of constructor constants or variables, where each constructor or variable is a child of the previous one. A trace from a recursion constant r to itself is a sequence starting with the head of the definition of r and ending with the parent of an occurrence of r. In Fig. 1, a trace from p2 to itself is [pad], and a trace from s1 to itself is [cocons, pad, pad, next]. Traces cross into definitions of recursion constants. Thus, a trace from p6 to itself is [pad, pad], which is also a trace from p7 to itself. A trace from s4 to itself is [cocons, next], and a trace from p5 to itself is [next, cocons]. If r = λx.f (r x) (g (r x)) (more precisely r = λx. f · (r · ([x]); g · (r · ([x])))), then there are two traces from r to itself, i.e., [f] and [f,g].

A higher-order rational term M is *trace-valid* if for all recursion constants r in M, each trace from r to itself contains a coinductive constructor, and that coinductive constructor has the highest priority among all constructors on that trace. To ensure trace validity, it is sufficient to check in a recursive definition, all occurrences of recursion constants are *guarded by* some coinductive constructor of the highest priority. The guardedness condition (judgment -<sup>Σ</sup> <sup>r</sup> - M) means that occurrences of r in M are guarded by some coinductive constructor of the highest priority, and the condition is decidable. In a well-typed signature Σ, if r : A <sup>=</sup> M <sup>∈</sup> Σ, then -<sup>Σ</sup> <sup>r</sup> -M. A detailed algorithm for checking trace-validity is presented in Appendix B.2 of the extended version. The reader may check guardedness for all valid terms in Fig. 1.

## 3.4 Hereditary Substitution

Hereditary substitution [29,19] provides a method of substituting one canonical term into another and still get a canonical term as the output by performing typebased normalization. This technique simplifies the definition of the term equality in the original LF [18,20] by separating the term equality and normalization from type checking. We extend the definition of hereditary substitution to account for recursion constants. Hereditary substitution is a partial operation on terms. When input term is not well-typed or prepattern restriction is not respected, the output may be undefined.

Hereditary substitution takes as an extra argument the simple type of the term being substituted by. The simple type τ is inductively generated by the following grammar.

$$
\tau ::= \* \mid \tau\_1 \to \tau\_2
$$

We write A<sup>o</sup> for the simple type that results from erasing dependencies in A. We write [N/x] τM for hereditarily substituting N for free ordinary variable x in M. The definition proceeds by induction on τ and the structure of M. For prepattern variables, since they may only stand for other prepattern variables, we use a notion of renaming substitution. The renaming substitution y/xM renames a prepattern variable or an ordinary variable x to prepattern variable y in M. Both substitutions naturally extend to other syntactic kinds. Hereditary substitution relies on renaming substitution when reducing prepattern applications. Because of the prepattern restriction, recursion constants are only applied to prepattern variables in a well-formed signature, and we never substitute into a recursive definition. Let σ be a simultaneous renaming substitution, a notion generalized from renaming substitutions, we write σM for carrying out substitution σ on M.

The definition for hereditary substitution is shown in Fig. 3. Appendix A of the extended version contains other straightforward cases of the definition. We note that prepattern Π-types erase to a base type <sup>∗</sup> because we may only apply terms of prepattern Π-types to prepattern variables, and thus the structure of the argument term does not matter.

#### 3.5 Term Equality

The equality checking of circular terms is carried out by iteratively unfolding recursive definitions [1,6,14,23]. The algorithm here is a slight adaptation of the equality algorithm for regular Böhm trees by Huet [21], tailored to the specific


Fig. 3. Hereditary Substitutions

case of CoLF's canonical term syntax. We emphasize that the equality algorithm can treat terms that are not trace-valid or well-typed, and is thus decoupled from validity checking and type checking. The algorithm itself checks for the prepattern restriction on recursion constants and contractiveness condition on recursive definitions. These checks are essential to ensure termination in the presence of forward referencing inside recursive definitions.

We define the judgment Δ; Θ -<sup>Σ</sup> M = M to mean M and M- , with free variables from Θ, are equal under the assumptions Δ, with consideration of recursive definitions in Σ. The variable list Θ is similar to Γ except it doesn't have the types for the variables. It is merely a list of pairwise distinct variables. Similarly, we define the judgment Δ; Θ -<sup>Σ</sup> S = S to mean spines S and S are element-wise equal. Equalities in Δ will be of the form (Θ - M = M- ) where Θ holds free variables of M and M- . We write Θ - M to mean that F V (M) ⊆ Θ. We define simultaneous variable renaming, that σ is a variable renaming from Θ- to Θ, written Θ σ : Θ to mean that if Θ- - M, then Θ - <sup>σ</sup>M. For instance, if we have x - x/y, x/z : y, z and y, z y · [z], then x - x/y, x/z(<sup>y</sup> · [z]), i.e., x x · [x]. The rules for the judgments are presented in Fig. 4. Recall that M is contractive (M contra) if the head of M is not a recursion constant.

An Example. Assume the signature in Section 2.1, and consider a stream generator that repeats its arguments. The stream may be represented by terms r1 and r2 below. Note that in the concrete syntax, square brackets represent λ-abstractions.

r1 : nat -> pstream = [x] cocons x (next (r1 x)). r2 : nat -> pstream = [x] cocons x (next (cocons x (next (r2 x)))).

Because r1 is a recursion constant, its type is a prepattern-Π type, and this restriction is respected in the body as x is a prepattern variable.

We want to show that r1 and r2 are equal in the framework. Let Σ be the signature of Section 2.1 plus the definitions for r1 and r2. We illustrate the 76 Z. Chen and F. Pfenning

Δ; Θ -<sup>Σ</sup> M = M-

$$\frac{\Theta \vdash \sigma : \Theta'}{\Delta, (\Theta' \vdash H \cdot S\_1 = H' \cdot S\_2); \Theta \vdash\_{\Sigma} [\sigma](H \cdot S\_1) = [\sigma](H' \cdot S\_2)} (1)$$

r : A = M ∈ Σ S<sup>1</sup> prepat M contra Δ,(Θ r · S<sup>1</sup> = H · S2); Θ -<sup>Σ</sup> S<sup>1</sup> -A<sup>o</sup> M = H · S<sup>2</sup> Δ; Θ -<sup>Σ</sup> r · S<sup>1</sup> = H · S<sup>2</sup> (2)

$$\begin{array}{c c c} & r: A = M \in \Sigma & S\_2 \text{ prepat} \\ \hline M \text{ contra} & H \neq r' & \Delta, (\Theta \vdash H \cdot S\_1 = r \cdot S\_2); \Theta \vdash\_{\Sigma} H \cdot S\_1 = S\_2 \rhd^{\mathcal{A}^o} M \\ \hline & \Delta; \Theta \vdash\_{\Sigma} H \cdot S\_1 = r \cdot S\_2 & (3) \\ \end{array}$$

$$\frac{\Delta; \Theta \vdash\_{\Sigma} S = S'}{\Delta; \Theta \vdash\_{\Sigma} c \cdot S = c \cdot S'} \text{(4)} \quad \frac{\Delta; \Theta \vdash\_{\Sigma} S = S'}{\Delta; \Theta \vdash\_{\Sigma} y \cdot S = y \cdot S'} \text{(5)} \quad \frac{\Delta; \Theta \vdash\_{\Sigma} M = M'}{\Delta; \Theta \vdash\_{\Sigma} \lambda x.M = \lambda x.M'} \text{(6)}$$

$$\begin{array}{ll} \begin{array}{l} \begin{array}{l} \Delta; \Theta \vdash\_{\Sigma} S = S' \\ \hline \\ \end{array} \end{array} \end{array} \qquad \begin{array}{l} \begin{array}{l} \Delta; \Theta \vdash\_{\Sigma} S = S' \\ \hline \\ \end{array} \end{array} \qquad \begin{array}{l} \begin{array}{l} \Delta; \Theta \vdash\_{\Sigma} M = M' \\ \hline \\ \Delta; \Theta \vdash\_{\Sigma} S = S' \\ \end{array} \qquad \begin{array}{l} \Delta; \Theta \vdash\_{\Sigma} S = S' \\ \hline \\ \end{array} \end{array} \qquad \begin{array}{l} \Delta; \Theta \vdash\_{\Sigma} S = S' \\ \hline \\ \end{array} \end{array}$$

## Fig. 4. Equality Checking

process of checking that ; -<sup>Σ</sup> λx. r1 ·([x]) = λx. r2 ·([x]) as a search procedure for a derivation of this judgment, where initially both Δ and Θ are empty.

Immediately after rule (6) we encounter ; <sup>x</sup> -<sup>Σ</sup> r1 ·([x]) = r2 ·([x]), we memoize this equality by storing (<sup>x</sup> r1 ·([x]) = r2 ·([x])) in <sup>Δ</sup> as in rule (2), and unfold the left-hand side. Then we proceed with the judgment.

$$(x \vdash \mathbf{r1} \cdot ([x]) = \mathbf{r2} \cdot ([x])); x \vdash\_{\Sigma} \mathbf{c0} \mathbf{cons} \cdot (x; \mathbf{next} \cdot (\mathbf{r1} \cdot ([x]))) = \mathbf{r2} \cdot ([x])$$

We then use rule (3) to unfold the right-hand side and store then current equation in the context. Then after several structural rules, we have

$$(x \vdash \mathbf{r1} \cdot ([x]) = \mathbf{r2} \cdot ([x])), \ldots; x \vdash\_{\Sigma} \mathbf{r1} \cdot ([x]) = \mathbf{c0} \mathbf{cons} \cdot (x; \mathbf{next} \cdot (\mathbf{r2} \cdot ([x])))$$

At this point, rule (2) applies. We add the current equation to the context and unfold the left recursive definition. Then after several structural rules, we encounter the following judgment.

$$\mathbf{r}(x \vdash \mathbf{r1} \cdot ([x]) = \mathbf{r2} \cdot ([x])), \dots; x \vdash\_{\Sigma} \mathbf{r1} \cdot ([x]) = \mathbf{r2} \cdot ([x])$$

Now we can close the derivation with rule (1) using identity substitution.

Decidability. Huet [21] has proved the termination, soundness, and completeness in the case of untyped regular Böhm trees. Our proof shares the essential idea with their proof. The termination relies on the fact that terms only admit finitely many subterms modulo renaming of both free and bound variables, and only subterms will appear in Δ. The soundness and completeness are proved with respect to the infinite Böhm tree [4] generated by unfolding the terms indefinitely, which again corresponds to a bisimulation between terms.

Theorem 1 (Decidability of Term Equality). *It is decidable whether* <sup>Δ</sup>; <sup>Θ</sup> -Σ M = M *for any rational term* M *and* M- *.*

*Proof.* We first show that there is a limit on the number of equations in Δ. Then the termination follows the lexicographic order of the assumption capacity (difference between current number of assumptions in Δ and the maximum), and the structure of the terms under comparison. It is obvious that rules (4)(5)(6) decompose the structure of the terms and rules (2)(3) reduce assumption capacity. It remains to show that the size of Δ has a limit.

The prepattern conditions on rules (2)(3) ensure that the expansion of recursive definitions will only involve renaming substitutions, and thus the resulting term will be an α-renaming of the underlying definition. No structurally new terms will be produced as a result of renaming substitution in rules (2)(3). We construct a finite set of all possible terms that could be added to the context. Each term is of finite depth and breadth limited by the existing constructs in the signature, and consists of finitely many constants, variables, and recursion constants. The constants and recursion constants are limited to those already presented in the signature. Although there are infinitely many variables, there are finitely many terms of bounded depth and width that are distinct modulo renaming of both bound and free variables. Thus, the set of terms that can appear as an element of Δ is finite, modulo renaming of free variables. The estimate of a rough upper bound can be found in Appendix D of the extended version.

We specify the infinite unfolding by specifying its unfolding to a Böhm tree of depth <sup>k</sup>, which is a finite approximation to the infinite Böhm tree, for each <sup>k</sup> <sup>∈</sup> <sup>N</sup>. Then the infinite Böhm tree is limit of all its finite approximations. We use the judgment exp(k)(M) =(k) <sup>M</sup> to denote the expansion of a higher-order rational term M to a Böhm tree M of depth k, and use the judgment exp(N) = N- to express that the higher-order rational term M expands to infinite Böhm tree N- . We also enrich the syntax of Böhm trees with prepattern variables. The full set of expansion rules can be found in Appendix E of the extended version. All cases are structural except for the following case when we expand a recursion constant, where we look up the definition of the recursion constant and plug in the arguments.

$$\exp\_{(k+1)}(r \cdot S) =\_{(k+1)} \exp\_{(k+1)}(S \rhd^{A^o} M) \text{ if } r: A = M \in \Sigma \text{ and } S \text{ repeat}$$

Lemma 1 (Expansion Commutes with Hereditary Substitution). *For all* <sup>k</sup>*,* <sup>τ</sup> *,* <sup>M</sup> *and* <sup>N</sup>*,* exp(k)([N/x] <sup>τ</sup>M) =(k) [exp(k)(N)/x] <sup>τ</sup> (exp(k)(M)) *if defined.*

*Proof.* Directly by lexicographic induction on k and the structure of M.

#### Theorem 2 (Soundness of Term Equality).

*If* ·; <sup>Θ</sup> - M = M- *, then* exp(k)(M) =(k) exp(k)(M- ) *for all* k*.*

*Proof.* By lexicographic induction on the depth <sup>k</sup> and the derivation <sup>Δ</sup>; <sup>Θ</sup> - M = M- . The case for the rule (1) is immediate by applying renaming substitutions at the closure rule. The cases for rules (2)(3) follow from the commutation lemma. The cases for rules (4)(5)(6) follow from the definition of exp.

#### Theorem 3 (Completeness of Term Equality).

*For rational terms* M *and* M- *, with free variables from* Θ*, if* exp(M) = exp(M- )*, then* ·; <sup>Θ</sup> - M = M- *.*

*Proof.* The equality algorithm is syntax-directed. We construct the derivation of ·; Θ - M = M by syntax-directed proof search following the structure of M. Every trace of exp(M) and exp(M- ) corresponds to a trace in the derivation of ·; Θ - M = M- . If exp(M) = exp(M- ), then two terms are equal on every trace, and there will be exactly one rule that applies at every point in the construction of the equality derivation. Termination is assured by Theorem 1.

#### 3.6 Type Checking Rules

For type checking, we define the judgments in Fig. 5 by simultaneous induction. Because recursion constants may be forward referenced, we need to have access to later declarations that have not been checked during the checking of earlier declarations. In order to ensure the otherwise linear order of the declarations, the type checking judgments are parametrized by a pair of signatures Ξ; Σ, where Ξ is the local signature that contains type-checked declarations before the current declaration and Σ is the global signature that contains full signatures, including declarations that have not been checked. In particular, recursion constants available for forward-referencing will be in Σ but not Ξ. The type equality judgments Γ -<sup>Σ</sup> A<sup>1</sup> = A2, Γ -<sup>Σ</sup> P<sup>1</sup> = P<sup>2</sup> only need to read recursive definitions from the global signature, and do not need to access the local signature.

A selection of type checking rules that are essential are presented in Fig. 6. The rest of the rules can be found in Appendix F of the extended version. To ensure the correct type checking order, i.e., the body of a recursive definition is checked after the types of all recursion constants within are checked, we defer checking the body of all recursive definitions to the end. This approach is viable because the term equality algorithm soundly terminates even when the recursive definition is not well-typed. For instance, if the signature Σ = c<sup>1</sup> : A1, c<sup>2</sup> : A2, r<sup>1</sup> : A<sup>3</sup> = M1, c<sup>3</sup> : A4, r<sup>2</sup> : A<sup>5</sup> = M2, then the order of checking is A1, A2, A3, A4, A5, M1, M2. This order is expressed in the type checking rules by an annotation on specific premise of the rules. The annotation [-<sup>Ξ</sup>;<sup>Σ</sup> M ⇐ A] <sup>1</sup>:deferred means that this judgment is to be checked after all the typing judgments have been checked. That is, when we check this premise, we have checked that -<sup>Σ</sup> Σ sig. Because of the deferred checking of recursive


Fig. 5. Type Checking Judgments

definitions, the judgment -<sup>Σ</sup> Ξ sig does not require the body of recursion declarations in Ξ to be well-typed. However, the categorical judgment Σ sig requires the body of every recursion declaration to be well-typed.

To enforce the restriction that forward references only happen in a recursive definition, the annotation [or r : A = M ∈ Σ] <sup>2</sup>:definitions means that forward reference only occurs during the checking of recursive definitions (which are deferred) and nowhere else.

#### 3.7 Metatheorems

We state some properties about hereditary substitution and type checking.

#### Theorem 4 (Hereditary Substitution Respects Typing).

Given a checked signature Σ where Σ sig, if Γ -<sup>Ξ</sup>;<sup>Σ</sup> N ⇐ A and Γ, x : A, Γ- - M ⇐ B, then Γ, [N/x] Ao Γ- -<sup>Ξ</sup>;<sup>Σ</sup> [N/x] Ao M ⇐ [N/x] Ao B.

Proof. By induction on the second derivation, with similar theorems for other judgment forms. This proof is similar to those in [29,19]. Because of the prepattern restriction, hereditary substitutions do not occur inside recursive definitions and is thus similar to hereditary substitutions in LF.

#### Theorem 5 (Decidability of Type Checking).

All typing judgments are algorithmically decidable.

Proof. The type checking judgment is syntax directed. Hereditary substitutions are defined by induction on the erased simple types and always terminate. Equality of types ultimately reduces to equality of terms, and we have proved its termination in Section 3.5.

Σ sig -<sup>Σ</sup> Σ sig Σ sig -<sup>Σ</sup> Ξ sig -<sup>Σ</sup> · sig -<sup>Σ</sup> Ξ sig -<sup>Ξ</sup>;<sup>Σ</sup> K ⇐ kind -<sup>Σ</sup> Ξ,a : K sig -<sup>Σ</sup> Ξ sig -<sup>Σ</sup> A ⇐ (co)type -<sup>Σ</sup> Ξ,c : A sig -<sup>Σ</sup> Ξ sig -<sup>Ξ</sup>;<sup>Σ</sup> A ⇐ (co)type [-<sup>Ξ</sup>;<sup>Σ</sup> M ⇐ A] 1:deferred A prepat M contra -<sup>Σ</sup> r - M -<sup>Σ</sup> Ξ,r : A = M sig Γ -<sup>Ξ</sup>;<sup>Σ</sup> K ⇐ kind Γ -<sup>Ξ</sup>;<sup>Σ</sup> type ⇐ kind Γ -<sup>Ξ</sup>;<sup>Σ</sup> cotype ⇐ kind Γ -<sup>Ξ</sup>;<sup>Σ</sup> A ⇐ (co)type Γ, x **(***∧***)** : A -<sup>Ξ</sup>;<sup>Σ</sup> K ⇐ kind Γ -<sup>Ξ</sup>;<sup>Σ</sup> Πx **(***∧***)** : A. K ⇐ kind Γ -<sup>Ξ</sup>;<sup>Σ</sup> A ⇐ (co)type Γ -<sup>Ξ</sup>;<sup>Σ</sup> A<sup>2</sup> ⇐ (co)type Γ, x **(***∧***)** : A<sup>2</sup> -<sup>Ξ</sup>;<sup>Σ</sup> A<sup>1</sup> ⇐ (co)type Γ -<sup>Ξ</sup>;<sup>Σ</sup> Πx **(***∧***)** : A2. A<sup>1</sup> ⇐ (co)type Γ -<sup>Ξ</sup>;<sup>Σ</sup> P ⇒ K K = type / cotype Γ -<sup>Ξ</sup>;<sup>Σ</sup> P ⇐ (co)type Γ -<sup>Ξ</sup>;<sup>Σ</sup> P ⇒ K a : K ∈ Ξ Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - K ⇒ K- Γ -<sup>Ξ</sup>;<sup>Σ</sup> a · S ⇒ K-

Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - K ⇒ K- Γ -<sup>Ξ</sup>;<sup>Σ</sup> () -K ⇒ K Γ -<sup>Ξ</sup>;<sup>Σ</sup> M ⇐ A<sup>2</sup> [M/x] A2<sup>o</sup> K = K- Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - K- ⇒ K-- Γ -<sup>Ξ</sup>;<sup>Σ</sup> M; S - Πx : A2. K ⇒ K-- y ˆ: A- <sup>2</sup> ∈ Γ Γ -<sup>Ξ</sup>;<sup>Σ</sup> A- <sup>2</sup> = A<sup>2</sup> y/x<sup>K</sup> <sup>=</sup> <sup>K</sup>- Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - K- ⇒ K-- Γ -<sup>Ξ</sup>;<sup>Σ</sup> [y]; S - Πx ˆ: A2. K ⇒ K-- Γ -<sup>Ξ</sup>;<sup>Σ</sup> M ⇐ A Γ -<sup>Ξ</sup>;<sup>Σ</sup> R ⇒ P- Γ -<sup>Σ</sup> P- = P Γ -<sup>Ξ</sup>;<sup>Σ</sup> R ⇐ P Γ, x **(***∧***)** : A<sup>2</sup> -<sup>Ξ</sup>;<sup>Σ</sup> M ⇐ A<sup>1</sup> Γ -<sup>Ξ</sup>;<sup>Σ</sup> λx. M ⇐ Πx **(***∧***)** : A2. A<sup>1</sup> Γ -<sup>Ξ</sup>;<sup>Σ</sup> R ⇒ P (c/x : A ∈ Γ or x ˆ: A ∈ Γ) Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - A ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> c/x · S ⇒ P r : A = M ∈ Ξ [or r : A = M ∈ Σ] 2:definitions Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - A ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> r · S ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - A ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> () -P ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> M ⇐ A<sup>2</sup> [M/x] A2<sup>o</sup> A<sup>1</sup> = A- 1 Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - A- <sup>1</sup> ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> M; S - Πx : A2. A<sup>1</sup> ⇒ P y ˆ: A- <sup>2</sup> ∈ Γ Γ -<sup>Ξ</sup>;<sup>Σ</sup> A- <sup>2</sup> = A<sup>2</sup> y/xA<sup>1</sup> <sup>=</sup> <sup>A</sup>- <sup>1</sup> Γ -<sup>Ξ</sup>;<sup>Σ</sup> S - A- <sup>1</sup> ⇒ P Γ -<sup>Ξ</sup>;<sup>Σ</sup> [y]; S -Πx ˆ: A2. A<sup>1</sup> ⇒ P

Fig. 6. Type Checking Rules (Condensed Selection)

## 4 Encoding Subtyping Systems for Recursive Types

In the presentation of case studies, we use the concrete syntax of our implementation, following Twelf [27]. The prepattern annotations are omitted. The full convention can be found in Appendix G of the extended version. Representations of circular derivations involve dependent usages of cotype's.

#### 4.1 Encoding a Classical Subtyping System

We present a mixed inductive and coinductive definition of subtyping using Danielsson and Altenkirch's [14] subtyping system. The systems concern the subtyping of types given by the following grammar.

$$\tau ::= \bot \mid \top \mid \tau\_1 \twoheadrightarrow \tau\_2 \mid \mu X. \tau\_1 \twoheadrightarrow \tau\_2 \mid X$$

The subtyping judgment is defined by five axioms and two rules, The axioms are

1. ⊥ ≤ τ (bot) 2. τ ≤ (top) 3. μX.τ<sup>1</sup> → τ<sup>2</sup> ≤ [μX.τ<sup>1</sup> → τ2/X](τ<sup>1</sup> → τ2) (unfold) 4. [μX.τ<sup>1</sup> → τ2/X](τ<sup>1</sup> → τ2) ≤ μX.τ<sup>1</sup> → τ<sup>2</sup> (fold) 5. τ ≤ τ (refl)

And the rules are shown below, where arr is coinductive and is written using a double horizontal line, and trans is inductive. The validity condition of mixed induction and coinduction entails that a derivation consisting purely of trans rules is not valid.

$$\begin{array}{ccccc} \tau\_1 \le \sigma\_1 & \sigma\_2 \le \tau\_2 \\ \hline \sigma\_1 \to \sigma\_2 \le \tau\_1 \to \tau\_2 \end{array} \text{(arr)} \qquad \begin{array}{c} \tau\_1 \le \tau\_2 \qquad \tau\_2 \le \tau\_3 \\ \hline \tau\_1 \le \tau\_3 \end{array} \text{(trans)}$$

Danielsson and Altenkirch defined the rules using Agda's mixed inductive and coinductive datatype (shown in Appendix H of the extended version) and the encoding in CoLF is shown in Fig. 7. The curly brackets indicate explicit Π-abstractions and the free capitalized variables are implicit Π-abstracted. We note that the mixed inductive and coinductive nature of the subtyping rules reflected in CoLF as two predicates, the inductive subtp and the coinductive subtpinf, and the latter has a higher priority. Clauses defining one predicate refer to the other predicate as a premise, e.g. subtp/arr and inf/arr. Let -− denote the encoding relation, and we have μX.σ τ = mu -X.σ -X.τ.

#### Theorem 6 (Adequacy of Encoding).


```
82 Z. Chen and F. Pfenning
tp : type.
bot : tp.
top : tp.
arr : tp -> tp -> tp.
mu : (tp -> tp) -> (tp -> tp) -> tp.
                                      subtp : tp -> tp -> type.
                                      subtpinf : tp -> tp -> cotype.
                                      subtp/top : subtp T top.
                                      subtp/bot : subtp bot T.
                                      refl : subtp T T.
trans : subtp T1 T2 -> subtp T2 T3 -> subtp T1 T3.
subtp/arr : subtpinf T1 T2 -> subtp T1 T2.
unfold : {T1}{T2} subtp (mu T1 T2) (arr (T1 (mu T1 T2)) (T2 (mu T1 T2))).
fold : {T1}{T2} subtp (arr (T1 (mu T1 T2)) (T2 (mu T1 T2))) (mu T1 T2).
inf/arr : subtp T1 S1 -> subtp S2 T2 -> subtpinf (arr S1 S2) (arr T1 T2).
```
Fig. 7. An Encoding of Subtyping in CoLF


We give an example of the subtyping derivation of μX.X - <sup>X</sup> <sup>≤</sup> μX.(<sup>X</sup> - <sup>⊥</sup>) - . Let <sup>S</sup> <sup>=</sup> μX.X - <sup>X</sup> and <sup>T</sup> <sup>=</sup> μX.(<sup>X</sup> - <sup>⊥</sup>) -.

<sup>S</sup> <sup>≤</sup> <sup>S</sup> - <sup>S</sup> unfold (s\_sub\_t) <sup>S</sup> <sup>≤</sup> <sup>T</sup> ⊥ ≤ <sup>S</sup> <sup>⊥</sup> <sup>T</sup> - ⊥ ≤ <sup>S</sup> - S - <sup>S</sup> - <sup>S</sup> <sup>≤</sup> <sup>S</sup> fold <sup>T</sup> - ⊥ ≤ <sup>S</sup> trans <sup>S</sup> ≤ <sup>S</sup> - <sup>S</sup> <sup>≤</sup> (<sup>T</sup> - <sup>⊥</sup>) - - (<sup>T</sup> - <sup>⊥</sup>) - ≤ <sup>T</sup> fold <sup>S</sup> - <sup>S</sup> <sup>≤</sup> <sup>T</sup> trans (s\_sub\_t) <sup>S</sup> <sup>≤</sup> <sup>T</sup> trans

Here is the encoding in CoLF:

```
s : tp = mu ([x] x) ([x] x).
t : tp = mu ([x] arr x bot) ([x] top).
s_sub_t : subtpst=
    trans (unfold ([x] x) ([x] x)) (trans (subtp/arr (inf/arr
                    (trans (subtp/arr (inf/arr s_sub_t subtp/bot))
                        (fold ([x] x) ([x] x))) subtp/top))
                        (fold ([x] arr x bot) ([x] top))).
```
We note that the circular definition is valid by the presence of the constructor inf/arr along the trace from s\_sub\_t to itself. The presence of the coinductive arr rule is the validity condition of mixed inductive and coinductive definitions.

There are two key differences between a CoLF encoding and an Agda encoding. First, in Agda one needs to use explicit names for μ-bound variables or de Bruijn indices, while in CoLF one uses abstract binding trees. Second, Agda does not have built-in coinductive equality but CoLF has built-in equality. In Agda, the one step of unfolding s\_sub\_t is not equal to s\_sub\_t, but in CoLF, they are equal.

## 4.2 Encoding a Polarized Circular Subtyping System for Equirecursive Types

We present an encoding of a variant Lakhani et al.'s polarized subtyping system [22] into CoLF. The system is circular. Due to space constraints, we only present the encoding for the positive types fragment and their emptiness derivations. This is an important part in the subtyping system because an empty type is a subtype of any other type. The full encoding of the polarized subtyping system can be found in Appendix I of the extended version.

Encoding of Positive Equirecursive Types. The equirecursive nature is captured by a signature Σ providing recursive definitions for type names t +.

$$\begin{array}{rcl} \tau^+, \sigma^+ ::= t\_1^+ \otimes t\_2^+ \mid \mathbf{1} \mid t\_1^+ \oplus t\_2^+ \mid \mathbf{0} \\ \Sigma & ::= \cdot \mid \Sigma, t^+ = \tau^+ \end{array}$$

Equirecursive types are directly encoded as recursion constants in the system, and the framework automatically provides equirecursive type equality checking. Because equirecursive types are circular, positive types are encoded as cotype.

```
postp : cotype.
times : postp -> postp -> postp.
                                   one : postp.
                                   plus : postp -> postp -> postp.
                                   zero : postp.
```
Theorem 7 (Adequacy of Type Encoding). *There is a bijection between circular types defined in an object signature for the positive types fragment and canonical forms of the* postp *in CoLF.*

*Proof.* By induction on the syntax in both directions.

Encoding of the Emptiness Judgment. The emptiness judgment t empty is defined by the following rules. We stress that these rules are to be interpreted coinductively.

$$\begin{array}{cccc}\displaystyle\underset{\begin{subarray}{c}\mathsf{0}\mathsf{empty}\mathsf{}\mathsf{}\mathsf{}\mathsf{}\mathsf{}\mathsf{}\mathsf{int}\end{subarray}}{\mathsf{0}\mathsf{empty}}(\mathsf{0}\mathsf{E}\mathsf{MP}) & \begin{array}{c}t=t\_{1}\oplus t\_{2}\in\Sigma\\t=\mathsf{c}\end{array}}&\begin{array}{c}t\_{1}\mathsf{ empty}\end{array}&t\_{1}\mathsf{empty}\mathsf{y}&t\_{2}\mathsf{ empty}\\t\texttt{empty}&t\_{1}\mathsf{empty}\end{array}\big{(}\begin{array}{c}t\_{2}\mathsf{ empty}\end{array}}{\mathsf{0}\mathsf{EMP}\_{1}}(\begin{array}{c}\mathsf{0}\mathsf{EMP}\_{1}\end{array})
\end{array}$$

$$\begin{array}{c}t=t\_{1}\otimes t\_{2}\in\Sigma\\t\texttt{empty}\end{array}\Big{(}\begin{array}{c}t=t\_{1}\otimes t\_{2}\in\Sigma\\t\texttt{ empty}\end{array}}&\begin{array}{c}t=t\_{1}\otimes t\_{2}\in\Sigma\\t\texttt{ empty}\end{array}\Big{(}\begin{array}{c}\mathsf{0}\mathsf{EMP}\_{2}\end{array})\Big{)}$$

In CoLF, the rules are encoded as follows. The coinductive nature is reflected by the typing of empty : postp -> cotype, which postulates that the predicate empty is to be interpreted coinductively.

empty : postp -> cotype. zero\_emp : empty zero. plus\_emp : empty T1 -> empty T2 -> empty (plus T1 T2). times\_emp\_1 : empty T1 -> empty (times T1 T2). times\_emp\_2 : empty T2 -> empty (times T1 T2).

Theorem 8 (Adequacy of Encoding). *There is a bijection between the circular derivations of* t empty *and the canonical forms of the type* empty t*.*

*Proof.* By induction on the syntax of the circular derivation in both directions.

As an example, we may show that the type <sup>t</sup>, where <sup>t</sup> <sup>=</sup> **<sup>1</sup>** <sup>⊗</sup> <sup>t</sup>, is empty by the following circular derivation.

$$\frac{(\mathtt{t}\\_\mathtt{empty})\ t\mathtt{empty}}{(\mathtt{t}\\_\mathtt{empty})\ \mathtt{1}\otimes t\mathtt{empty}}\,\otimes\mathtt{EMP}\_2$$

This derivation can be encoded as follows.

t : postp = times one t. t\_empty : empty t = times\_emp\_2 t\_empty.

The reader is advised to take a look at Appendix I.3 of the extended version for two simple yet elegant examples of subtyping derivations.

## 5 Related Work

*Cyclic* λ*-Calculus and Circular Terms.* Ariola and Blom [2], and Ariola and Klop [3] studied the confluence property of reduction of cyclic λ-calculus. Their calculus differs from CoLF in several aspects. Their calculus is designed to capture reasoning principles of recursive functions and thus has a general recursive let structure that can be attached to terms at any levels. Terms are equated up to infinite Lévy-Longo trees (with decidable equality), but equality as Böhm trees is not decidable. CoLF is designed for circular terms and circular derivations, and all recursive definitions occur at the top level. Terms are equated up to infinite Böhm trees and the equality is decidable. Our equality algorithm is adapted from Huet'algorithm for the regular Böhm trees [21]. Equality on firstorder terms has been studied both in its own respect [16] and in the context of subtyping for recursive types [1,6,14,23]. Our algorithm when applied to firstorder terms is "the same". Courcelle [13] and Djelloul et al. [15] have studied the properties of first-order circular terms. Simon [28] designed a coinductive logic programming language based on the first-order circular terms. Contrary to CoLF, there are no mutual dependencies between inductive and coinductive predicates in Simon's language.

*Logical Frameworks.* Harper et al. [18] designed the logical framework LF, which this work extends upon. Pfenning et al. later adds notational definitions [26]. The method of hereditary substitution was developed as part of the research on linear and concurrent logical frameworks [9,29,10]. Harper and Licata demonstrated the method in formalizing the metatheory of simply typed λ-calculus [19]. In his master's thesis, Chen has investigated a mixed inductive and coinductive logical framework with an infinite stack of priorities but only in the context of a first-order type theory [12].

*Mixed Induction and Coinduction and Circular Proof Systems.* The equality and subtyping systems of recursive types [1,6,14,23,22] have traditionally recognized coinduction and more recently mixed induction and coinduction as an underlying framework. Fortier and Santocanale [17] devised a circular proof system for propositional linear sequent calculus with mixed inductive and coinductive predicates. This system together with Charatonik et al.'s Horn μ-calculus [11] motivated the validity condition of CoLF. Brotherston and Simpson devised an infinitary and a circular proof system as methods of carrying out induction [7,8]. Due to the complexity of their validity condition, the encoding of Brotherston and Simpson's system in full generality and Fortier and Santocanale's system is currently not immediate and is considered in ongoing work.

## 6 Conclusion

We have presented the type theory of a novel logical framework with higher-order rational terms, that admit coinductive and mixed inductive and coinductive interpretations. We have proposed the prepattern variables and prepattern Πtypes to give a type-theoretic formulation of regular Böhm trees. Circular objects and derivations are represented as higher-order rational terms, as demonstrated in the case study of the subtyping deductive systems for recursive types.

We once again highlight the methodology of logical frameworks and what CoLF accomplishes. Logical frameworks internalize equalities that are present in the term model for an object logic. LF [18] internalizes αβη-equivalence of the dependently typed λ-calculus. Within LF, one is not able to write a specification that distinguishes two terms that are α or β-equivalent, because those two corresponding derivations are identical in the object logic. Similarly, the concurrent logical framework CLF [29] internalizes equalities of concurrent processes that only differ in the order of independent events. The logical framework CoLF internalizes the equality of circular derivations. Using CoLF, one cannot write a specification that distinguishes between two different finitary representations of the same circular proof. It is this property that makes CoLF a more suitable framework for encoding circular derivations than existing finitary frameworks. Acknowledgments. We would like to thank Robert Harper and Brigitte Pien-

tka for insightful discussion on the research presented here and the anonymous reviewers for their helpful comments and suggestions.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## A Higher-Order Language for Markov Kernels and Linear Operators

Pedro H. Azevedo de Amorim()

Cornell University, Ithaca, NY, USA pamorim@cs.cornell.edu

Abstract. Much work has been done to give semantics to probabilistic programming languages. In recent years, most of the semantics used to reason about probabilistic programs fall in two categories: semantics based on Markov kernels and semantics based on linear operators. Both styles of semantics have found numerous applications in reasoning about probabilistic programs, but they each have their strengths and weaknesses. Though it is believed that there is a connection between them there are no languages that can handle both styles of programming. In this work we address these questions by defining a two-level calculus and its categorical semantics which makes it possible to program with both kinds of semantics. From the logical side of things we see this language as an alternative resource interpretation of linear logic, where the resource being kept track of is sampling instead of variable use.

Keywords: Linear Logic, Probabilistic Programming, Categorical Semantics.

## 1 Introduction

Probabilistic primitives have been a standard feature of programming languages since the 70s. At first, randomness was mostly used to program so called random algorithms, i.e. algorithms that require access to a source of randomness. Recently, however, with the rise of computational statistics and machine learning, randomness is also used to program statistical models and inference algorithms.

Programming languages researchers have seen this rise in interest as an opportunity to further study the interaction of probability and programming languages, establishing it as an active subfield within the PL community.

One of the main goals of this subfield is giving semantics to programming languages that are both expressive in the regular PL sense as well as in its abilities to program with randomness. One particular difficulty is that the mathematical machinery used for probability theory, i.e. measure theory, does not interact well with higher-order functions [2].

Currently, there are two classes of models of probabilistic programming in its broad sense — that have found numerous applications: models based on linear logic and models based on Markov kernels. Since each kind of semantics has peculiarities that make them more or less adequate to give semantics to expressive programming languages, it is an important theoretical question to understand how these classes of models are related.

Linear Logic for Probabilistic Semantics The models of linear logic that have been used to give semantics to probabilistic languages are usually based on categories of vector spaces where programs are denoted by linear operators. We highlight two of them:


The main advantage of models based on linear logic is that programs are denoted by linear operators between spaces of distributions, a formalism that has been extensively used to reason about stochastic processes, as illustrated by Dahlqvist and Kozen who have used results from ergodic theory to reason about a Gibbs sampling algorithm written in their language, and by Clerc et al. who have shown how Bayesian inference can be given semantics using adjoint of linear operators [7].

Unfortunately, these insights are hard to realize in practice, since languages based on linear logic enforce that variables must be used exactly once, making it hard to use it as a programming language. The usual way linear logic deals with this limitation is through the ! modality which allows variables to be reused.

The problem with the exponential modality, when it comes to probabilistic programming, is that they are usually difficult to construct, do not have any clear interpretation in terms of probability, making the linear operator formalism not applicable anymore and, more operationally, through its connections with callby-name (CBN) semantics [18], makes it mathematically hard to reuse sampled values.

Ehrhard et al. have found a way around this problem by introducing a callby-value (CBV) let operator that allows samples to be reused [11,24]. In the discrete case this operator is elegantly defined by a categorical argument which is unknown to scale to the continuous case, which they deal with by making use of an ad-hoc construction that is unclear if it can be generalized to other models of linear logic. Therefore, our current understanding of models of linear logic does not provide a uniform way of reusing samples.

The difference between CBV and CBN can be illustrated by the program let x = coin in x + x, where coin is a primitive that outputs 0 or 1 with equal probability. In the CBN semantics each use of x corresponds to a new sample from coin, whereas in the CBV semantics the coin is only sampled once.

A subtler problem of probabilistic models based on linear logic is that they are ill-equipped to program with joint distributions. For instance, the language proposed by Ehrhard et. al can be easily extended with product types which, under their semantics, would make the type **R** × **R** be interpreted as M**R** ×M**R**, where M**R** is the set of distributions over **R** – which is isomorphic to the set of independent distributions over **R**<sup>2</sup>. Dahlqvist and Kozen deal with this issue by adding primitive types **R**<sup>n</sup> to their language which are interpreted as the set of joint distributions over **R**<sup>n</sup>. However, since they are not defined using the type constructors provided by the semantic domain, programs of type **R**<sup>n</sup> can only be manipulated by primitives defined outside the language.

Markov Kernel Semantics Markov kernels are a generalization of transition matrices, i.e. functions that map states to probability distributions over them. They are appealing from a programming languages perspective because their programming model is usually captured by monads and Kleisli arrows, a common abstraction in programming languages semantics, and have been extensively used to reason about probabilistic programs [1,22,3]. By being related to monadic programming they differ from their linear operator counterpart by being able to naturally capture a call-by-value semantics which, as we argued above, is the most natural one for probabilistic programming.

Unfortunately, even though these semantics can be generalized to continuous distributions, they are notoriously brittle when it comes to higher-order programming. Only recently, with the introduction of quasi Borel spaces [15] and its probability monad, it is possible to give a kernel-centric semantics to higher-order probabilistic programming with continuous distributions.

However, due to quasi Borel spaces being a different foundation to probability theory, it is unclear which theorems and theories can be generalized to higher-order. For instance, martingale theory has been used in Computer Science to reason about termination of probabilistic programs [6,20,16]. In order to generalize these ideas to higher-order functions it would be necessary to define a quasi Borel version of martingales and prove appropriate versions of the main theorems from martingale theory, a non-trivial task.

Our Work: Combining both Kinds of Semantics Though both styles of semantics provide insights into how to interpret probabilistic programming languages (PPL), it is still too early to claim that we have a "correct" semantics which subsumes all of the existing ones. Both approaches mentioned above have their advantages and drawbacks.

In this work we shed some light into how both semantics relate to one another by showing that it is possible to use both styles of semantics to interpret a linear calculus that has higher-order functions, looser linearity restrictions, a uniform way of dealing with sample reuse and better syntax for programming joint distributions while still being close to their kernel and linear operator counterparts. Interestingly, we identify the joint distribution problem described above to be a consequence of linear logic requiring the non-linear product to be cartesian. In order to tackle this problem we build on categorical semantics of linear logic and on recent work on Markov categories, a suitable categorical generalization of Markov kernels defined using semicartesian products.

We bridge the gap between these semantics by noting that the regular resource interpretation of linear logic, i.e. A - B being equivalent to "by using one copy of A I get one copy of B" is too restrictive an interpretation for probabilistic

programming. Instead, we should think of usage as being equivalent to sampling. Therefore the linear arrow A - B should be thought of as "by sampling from A once I get B", which is the computational interpretation of Markov kernels.

We realize this interpretation through a multilanguage approach: we have one language that programs Markov kernels, a second language that programs linear operators and add syntax that transports programs from the former language into the latter one. To justify the viability of our categorical framework we show how existing probabilistic semantics are models to our language and show how, under mild conditions, this semantics can be generalized to commutative effects.

Our contributions are:


## 2 Mathematical Preliminaries

We are assuming that the reader is familiar with basic notions from category theory such as categories, functors and monads.

## Probability Theory

Transition matrices are one of the simplest abstractions used to model stochastic processes. Given two countable sets A and B, the entry (a, b) of a transition matrix is the probability of ending up in state b ∈ B whenever you start from the initial state a ∈ A and every row adds up to 1.

Definition 1. *The category* **CountStoch** *has countable sets as objects and transition matrices as morphisms. The identity morphism is the identity matrix and composition is given by matrix multiplication.*

Though transition matrices are conceptually simple, they can only model discrete probabilistic processes and, in order to generalize them to continuous probability we must use measurable sets and Markov kernels.

Definition 2. *A measurable set is a pair* (A, ΣA)*, where* A *is a set and* Σ<sup>A</sup> ⊆ P(A) *is a* σ*-algebra, i.e. it contains the empty set and it is closed under complements and countable unions.*

Definition 3. *A function* f : (A, ΣA) → (B,ΣB) *is called measurable if for every* B ∈ <sup>Σ</sup>B*,* <sup>f</sup> <sup>−</sup><sup>1</sup>(B) <sup>∈</sup> <sup>Σ</sup>A*.*

Definition 4. *Let* (A, Σ<sup>A</sup>) *be a measurable space. A probability distribution* (A, Σ<sup>A</sup>) *is a function* <sup>μ</sup> : <sup>Σ</sup><sup>A</sup> <sup>→</sup> [0, 1] *such that* <sup>μ</sup>(∅)=0*,* <sup>μ</sup>(A)=1 *and* μ(<sup>i</sup>∈**<sup>N</sup>**A<sup>i</sup>) = - <sup>i</sup>∈**<sup>N</sup>** <sup>μ</sup>(A<sup>i</sup>)*.*

Given two measurable sets (A, Σ<sup>A</sup>) and (B,Σ<sup>B</sup>) it is possible to define a <sup>σ</sup>-algebra over <sup>A</sup>×<sup>B</sup> generated by the sets <sup>X</sup> <sup>×</sup><sup>Y</sup> which we denote by <sup>Σ</sup><sup>A</sup> <sup>⊗</sup>Σ<sup>B</sup>, where <sup>X</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> and <sup>Y</sup> <sup>∈</sup> <sup>Σ</sup><sup>B</sup>. Furthermore, every pair of distributions <sup>μ</sup><sup>A</sup> and <sup>μ</sup><sup>B</sup> over <sup>A</sup> and <sup>B</sup> respectively, can be lifted to a distribution <sup>μ</sup><sup>A</sup> <sup>⊗</sup>μ<sup>B</sup> over <sup>A</sup>×<sup>B</sup> such that (μ<sup>A</sup> <sup>⊗</sup> <sup>μ</sup><sup>B</sup>)(<sup>X</sup> <sup>×</sup> <sup>Y</sup> ) = <sup>μ</sup><sup>A</sup>(X)μ<sup>B</sup>(<sup>Y</sup> ), for <sup>X</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> and <sup>Y</sup> <sup>∈</sup> <sup>Σ</sup><sup>B</sup>.

Definition 5. *Let* (A, Σ<sup>A</sup>) *and* (B,Σ<sup>B</sup>) *be two measurable spaces. A Markov kernel is a function* <sup>f</sup> : <sup>A</sup> <sup>×</sup> <sup>Σ</sup><sup>B</sup> <sup>→</sup> [0, 1] *such that*


Definition 6. *The category* **Kern** *has measurable sets as objects and Markov kernels as morphisms. The identity arrow is the function* id<sup>A</sup>(a, <sup>A</sup>)=1 *if* a ∈ A *and* <sup>0</sup> *otherwise and Composition is given by* (f ◦ g)(a, <sup>C</sup>) = f(−, <sup>C</sup>)d(g(a, <sup>−</sup>))*.*

### Markov Categories

The field of categorical probability was developed in order to get a more conceptual understanding of Markov kernels. One of its cornerstone definitions is that of a Markov category which are categories where objects are abstract sample spaces, morphisms are abstract Markov kernels and every object has "contraction" and "weakening" morphisms which correspond to duplicating and discarding a sample, respectively, without adding any new randomness.

Definition 7 (Markov category [12]). *A Markov category is a semicartesian symmetric monoidal category* (**C**, <sup>⊗</sup>, 1) *in which every object* A *comes equipped with a commutative comonoid structure, denoted by* copy<sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>X</sup> <sup>⊗</sup> <sup>X</sup> *and* delete<sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>1</sup>*, where* copy *satisfies*

$$\mathsf{copy}\_{X\otimes Y} = (id\_X \otimes b\_{Y,X} \otimes id\_Y) \circ (\mathsf{copy}\_X \otimes \mathsf{copy}\_Y),$$

where <sup>b</sup>Y,X is the natural isomorphism <sup>Y</sup> <sup>⊗</sup> <sup>X</sup> <sup>∼</sup><sup>=</sup> <sup>X</sup> <sup>⊗</sup> <sup>Y</sup> . The category being semicartesian means that the monoidal product comes equipped with projection morphisms <sup>π</sup><sup>1</sup> : <sup>A</sup>⊗<sup>B</sup> <sup>→</sup> <sup>A</sup> and <sup>π</sup><sup>2</sup> : <sup>A</sup>⊗<sup>B</sup> <sup>→</sup> <sup>B</sup>, but it is not Cartesian because the equation (π<sup>1</sup> ◦ f, π<sup>2</sup> ◦ <sup>f</sup>) = <sup>f</sup> does not hold in general which, intuitively, corresponds to the fact that joint distributions might be correlated.

Theorem 1 ([12]). **CountStoch** *is a Markov category.*

The monoidal product is given by the Cartesian product and the monoidal unit is the singleton set. The copy<sup>X</sup> morphism is the matrix <sup>X</sup> <sup>×</sup> <sup>X</sup> <sup>×</sup> <sup>X</sup> <sup>→</sup> [0, 1] which is <sup>1</sup> in the positions (x, x, x) and <sup>0</sup> elsewhere, and the delete<sup>X</sup> morphism is the constant <sup>1</sup> matrix indexed by X.

Theorem 2 ([12]). **Kern** is a Markov category.

This category is the continuous generalization of **CountStoch** and the monoidal product is the Cartesian product with the product σ-algebra and the monoidal unit is the singleton set {∗}. The copy<sup>X</sup> morphism is the Markov kernel copy<sup>X</sup> : X × Σ<sup>X</sup> ⊗ Σ<sup>X</sup> → [0, 1] such that copyX(x, S × T)=1 if x ∈ S ∩ T and 0 otherwise. Its delete morphism is simply the function that given any element in X, returns the function which is 1 on the measurable set {∗} and 0 on the empty measurable set.

## Linear Logic and Monoidal Categories

We recall the categorical semantics of the multiplicative fragment of linear logic (MLL):

Definition 8 ([21]). A category **C** is an MLL model if it is symmetric monoidal closed (SMCC), i.e. the functors <sup>A</sup> ⊗ − have a right adjoint <sup>A</sup> -−.

We denote the monoidal product as ⊗ and the space of linear maps between objects X and Y as X - Y , ev : ((X - Y ) ⊗ X) → Y is the counit of the monoidal closed adjunction and cur : **<sup>C</sup>**(<sup>X</sup> <sup>⊗</sup>Y,Z) <sup>→</sup> **<sup>C</sup>**(X, Y - Z) is the linear curryfication map. We use the triple (C, <sup>⊗</sup>,-) to denote such models.

Definition 9. Let (**C**, ⊗**C**, 1**C**) and (**D**, ⊗**D**, 1**D**) be two monoidal categories. We say that a functor F : **C** → **D** is lax monoidal if there is a morphism : 1**<sup>D</sup>** → F(1**C**) and a natural transformation μX,Y : F(X) ⊗**<sup>D</sup>** F(Y ) → F(X ⊗**<sup>C</sup>** Y ) making the diagrams in Figure 8 (in Appendix B) commute.

If and μX,Y are isomorphisms we say that F is strong monoidal.

One key observation of this paper is that there are many lax monoidal functors between Markov categories and models of linear logic that can interpret probabilistic processes.

## 3 Syntax

In this section we will design a syntax that reflects the fact that linearity corresponds to sampling, not variable usage. We achieve this by making use of a multi-language semantics that enables the programmer to transport programs defined in a Markov kernel-centric language (MK) to a linear, higher-order, language (LL).

Our thesis is that in the context of probabilistic programming, linear logic, through its connection with linear algebra, departs from its usual Computer Science applications of enforcing syntactic invariants and, instead, provides a natural mathematical formalism to express ideas from probability theory, as shown by Dahlqvist and Kozen [8].

Therefore, since many probabilistic programming constructs, such as Bayesian inference and Markov kernels, can be naturally interpreted in linear logic terms,

```
τ := 1 | τ × τ
```
M,N := x | unit | let x = M in N | (M,N) | π1M | π2N | f(M)

Γ := · | x : τ , Γ

Fig. 1: Syntax MK

we believe that our calculus allows the user to benefit from the insights linearity provides to PPL while unburdening them from worrying about syntactic restrictions by making it possible to also program using kernels.

We use standard notation from the literature: Γ t : τ means that the program t has type τ under context Γ, t{x/u} means substitution of u for x in t and t{−→x /−→u } is the simultaneous substitution of the term list −→u for a variable list −→x in t.

Both languages will be defined in this section and, for presentation's sake, we are going to use orange to represent MK programs and purple to represent LL programs.

## 3.1 A Markov Kernel Language

We need a language to program Markov kernels. Since we are aiming at generality, we are assuming the least amount of structure possible. As such we will be working with the internal language of Markov categories, as presented in Figure 1 and Figure 4<sup>1</sup>. Note that we are implicitly assuming a set of primitives for the functions f.

By construction, every Markov category can interpret this language, as we show in Figure 6, with types being interpreted as

$$\begin{aligned} \left[1\right] &= 1\\ \left[\tau\_1 \times \tau\_2\right] &= \left[\tau\_1\right] \times \left[\tau\_2\right] \end{aligned} $$

and the contexts are interpreted using × over the interpretation of the types. However, as it stands, it is not very expressive, since it does not have any probabilistic primitives nor does it have any interesting types since 1 × 1 ∼= 1.

When working with concrete models (c.f. Section 5) we can extend the language with more expressive types as well as with concrete probabilistic primitives. For instance, in the context of continuous probabilities we could add a **R** datatype and a · uniform : **R** uniform distribution primitive.

Note that even though this language does not have any explicit sampling operators, this is implicitly achieved by the let operator. For instance, the program

<sup>1</sup> c.f. Appendix A.

<sup>τ</sup> := <sup>1</sup> <sup>|</sup> <sup>τ</sup> τ | τ ⊗ τ t, u := x | unit | λx. t | t u | t ⊗ u | let x ⊗ y = t in u Γ := · | x : τ , Γ

Fig. 2: Syntax LL

let x = uniform in x + x samples from a uniform distribution, binds the result to the variable x and adds the sample to itself (Fig. 2).

### 3.2 A Linear Language

Our second language is a linear simply-typed λ-calculus, with the usual typing rules shown in Figure 5 in Appendix A, which can be interpreted in every symmetric monoidal closed category as shown in Figure 7, also in Appendix A, with types interpreted by

$$\begin{aligned} \left[1\right] &= 1\\ \left[\underline{\tau}\_1 \otimes \underline{\tau}\_2\right] &= \left[\underline{\tau}\_1\right] \otimes \left[\underline{\tau}\_2\right] \\ \left[\underline{\tau}\_1 \multimap \underline{\tau}\_2\right] &= \left[\underline{\tau}\_1\right] \multimap \left[\underline{\tau}\_2\right] \end{aligned}$$

and the contexts are interpreted using ⊗ over the interpretation of the types. Once again, we are aiming at generality instead of expressivity. In a concrete setting it would be fairly easy to extend the calculus with a datatype **N** for natural numbers and probabilistic primitives such as · coin : **N** that flips a fair coin.

The idea behind the particular linear logic models that we are interested in is that, by integration, Markov kernels can be seen as linear operators between vector spaces of probability distributions. As such, an LL program x : **N** LL t : **N** will be denoted by a linear function between distributions over the natural numbers. Therefore, from a programming point of view, variables are placeholders for probability distributions, i.e. computations, not values, and sampling occurs when variables are used.

#### 3.3 Combining Languages

The main drawback of the linear calculus above is that the syntactic linearity restriction makes it hard to program with it, while the main drawback of the Markov language is that it does not have higher-order functions. In this section we will show how we can combine both language so that we get a calculus with looser linearity restrictions while still being higher-order.

τ := 1 | τ × τ <sup>τ</sup> := <sup>1</sup> <sup>|</sup> <sup>M</sup><sup>τ</sup> <sup>|</sup> <sup>τ</sup> τ | τ ⊗ τ M,N := x | unit | let x = M in N | f(M) | (M,N) | π1M | π2M t, u := x | unit | λx. t | t u | t ⊗ u | let x ⊗ y = t in u |sample t<sup>i</sup> as x<sup>i</sup> in M

Fig. 3: Syntax LL+MK

As we will show in Section 5, when looking at concrete models for these languages we can see that the semantic interpretations of variables in both languages are completely different: in the MK language variables should be thought of as values, i.e. the values that were sampled from a distribution, whereas in the LL language, variables of ground type are distributions. In order to bridge these languages we must use the observation that Markov kernels — i.e. open MK terms — have a natural resource-aware interpretation of being "sample-once" stochastic processes and, by integration, can be seen as linear maps between measure spaces — i.e. open LL terms. The combined syntax for the language is depicted in Figure 3.

We now have a language design problem: we want to capture the fact that every open MK program is, semantically, also an open LL term. The naive typing rule is:

$$\frac{x\_1 : \tau\_1, \dots, x\_n : \tau\_n \vdash\_{MK} M : \tau}{x\_1 : \mathcal{M}\tau\_1, \dots, x\_n : \mathcal{M}\tau\_n \vdash\_{LL} \mathsf{MK}(M) : \mathcal{M}\tau}$$

The problem with this rule is that it breaks substitution: the variables in the premise are MK variables whereas the ones in the conclusion are LL variables.

We solve this problem by making the syntax reflect a common idiom of PPLs: compute distributions (elements of <sup>M</sup><sup>τ</sup> ), sample from it and then use the result in a non-linear continuation. This is captured by the following syntax:

$$\text{sample } t\_1, \dots, t\_n \text{ as } x\_1, \dots, x\_n \text{ in } M$$

Note that we are sampling from LL programs <sup>t</sup>i (possibly an empty list), outputting the results to MK variables <sup>x</sup>i and binding them to an MK program <sup>M</sup>. When clear from the context we simply use sample <sup>t</sup>i as <sup>x</sup>i in <sup>M</sup>. Its corresponding typing rule is:

$$\frac{\begin{array}{l}\text{SAMPLE} \\ x\_1:\tau\_1 \cdots x\_n:\tau\_n \vdash\_{MK} M: \tau \quad \Gamma\_i \vdash\_{LL} t\_i: \mathcal{M} \tau\_i \qquad 0 \le i < n \end{array}}{\begin{array}{l}\Gamma\_1, \cdots, \Gamma\_n \vdash\_{LL} \text{sample } t\_i \text{ as } x\_i \text{ in } M: \mathcal{M} \tau \end{array}}$$

As the typing rule suggests, its semantics should be some sort of composition. However, since we are composing programs that are interpreted in different categories, we must have a way of translating MK programs into LL programs — as we will see in Section 4 this translation will be functorial. The operational interpretation of this rule is that we have a set of distributions {ti} defined using the linear language — possibly using higher-order programs — we sample from them, bind the samples to the variables {xi} in the MK program M where there are no linearity restrictions. Note that the rule above looks very similar to a monadic composition, though they are semantically different (cf. Section 4).

With this new syntax we can finally program in accordance with our new resource interpretation of linear logic, allowing us to write the program

sample coin as x in (x = x),

which flips a coin once and tests the result for equality with itself, making it equivalent to true.

This combined calculus enjoys the expected syntactic properties<sup>2</sup>.

Theorem 3. Let Γ, x : <sup>τ</sup><sup>1</sup> -LL <sup>t</sup> : <sup>τ</sup> and <sup>Δ</sup> -LL u : τ<sup>1</sup> be well-typed terms, then Γ,Δ -LL t{x/u} : τ

Proof. The proof can be found in Appendix D.

The following example illustrates how we can use the MK language to duplicate and discard linear variables.

Example 1. The program which samples from a distribution t and then returns a perfectly correlated pair is given by:

> · -LL sample t as x in (x, x) : M(τ × τ )

Similarly, the program that samples from a distribution t and does not use its sampled value is represented by the term

> · -LL sample t as x in unit : M1

Example 2. Suppose that we have a Markov kernel given by an open MK term x : **N** - <sup>M</sup> : **<sup>N</sup>**. If we want to encapsulate it as a linear program of type <sup>M</sup>**<sup>N</sup>** - M**N** we can write:

> · -LL λ meas.(sample meas as <sup>x</sup> in <sup>M</sup>) : <sup>M</sup>**<sup>N</sup>** -M**N**

Example 3. As we explain in the introduction, Dahlqvist and Kozen must add many primitives to their language to work around their linearity restrictions. For instance, in order to write projection functions **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup>, n>m they must add projection primitives to the language.

<sup>2</sup> To avoid visually polluting the proofs we will drop the color code in Theorem 3 and Theorem 7

By having compositional type constructors that can represent joint distributions , i.e. M(τ × τ ), it is possible to write the program sample t as x in (π<sup>1</sup> x, π<sup>3</sup> x) which samples from a distribution over triples and returns only the first and third components by only using the syntax of products in MK.

Unfortunately there are some aspects of this language that still are restrictive. For instance, imagine that we want to write an LL program that receives two "Markov kernels" <sup>M</sup>**N**-M**N** and a distribution over **N** as inputs, samples from the input distribution, feeds the result to the Markov kernels, samples from them and adds the results. Its type would be

$$(\mathcal{M}\mathbb{N}\multimap\mathcal{M}\mathbb{N})\multimap(\mathcal{M}\mathbb{N}\multimap\mathcal{M}\mathbb{N})\multimap\mathcal{M}\mathbb{N}\multimap\mathcal{M}\mathbb{N}$$

Even though the program only requires you to sample once from each distribution, it is still not possible to write it in the linear language.

We will show in Section 4 how the type constructor M actually corresponds to an applicative functor [19], and the limitation above is actually a particular case of a fundamental difference between programming with applicative functors compared to programming with monads.

Remark 1. We now have two languages that can interpret probabilistic primitives such as coin. However, every primitive M in the MK language can be easily transported to an LL program by using an empty list of LL programs: sample \_ as \_ in M. Therefore it makes sense to only add these primitives to the MK language.

## 4 Categorical Semantics

As it is the case with categorical interpretations of languages/logics, types and contexts are interpreted as objects in a category and every well-typed program/proof gives rise to a morphism.

In our case, MK types <sup>τ</sup> are interpreted as objects <sup>τ</sup> in a Markov category (**M**, ×) and well-typed programs Γ -MK M : τ are interpreted as an **M** morphism -<sup>Γ</sup> <sup>→</sup> <sup>τ</sup> , as shown in Figure 6. Similarly, LL types <sup>τ</sup> are interpreted as objects <sup>τ</sup> in a model of linear logic (**C**, <sup>⊗</sup>,-) and well-typed programs Γ -LL <sup>t</sup> : <sup>τ</sup> are interpreted as a **<sup>C</sup>** morphism -<sup>Γ</sup> <sup>→</sup> <sup>τ</sup> , as shown in Figure 7.

To give semantics to the combined language is not as straightforward. The sample rule allows the programmer to run LL programs, bind the results to MK variables and use said variables in an MK continuation. The implication of this rule in our formalism is that our semantics should provide a way of translating MK programs into LL programs. In category theory this is usually achieved by a functor M : **M** → **C**.

However, we can easily see that functors are not enough to interpret the sample rule. Consider what happens when you apply M to an MK program x : τ <sup>1</sup>, y : τ <sup>2</sup> -MK N : τ :

$$\mathcal{M}[N]: \mathcal{M}(\tau\_1 \otimes \tau\_2) \to \mathcal{M}\tau$$

#### 100 P. H. A. de Amorim

To precompose it with two LL programs outputting Mτ<sup>1</sup> and Mτ<sup>2</sup> we need a mediating morphism μ<sup>τ</sup>1,τ<sup>2</sup> : Mτ1⊗Mτ<sup>2</sup> → M(τ1×τ2). Furthermore, if N has three or more free variables, there would be several ways of applying μ. Since from a programming standpoint it should not matter how the LL programs are associated, we require that μ<sup>τ</sup>1,τ<sup>2</sup> makes the lax monoidality diagrams to commute. Therefore, assuming lax monoidality of μ we can interpret the sample rule:

$$\frac{\tau\_1 \times \cdots \times \tau\_n \xrightarrow{N} \tau \quad \quad \Gamma\_i \xrightarrow{t\_i} \mathcal{M}\tau\_i}{\Gamma \xrightarrow{t\_1 \otimes \cdots \otimes t\_n} \mathcal{M}\tau\_1 \otimes \cdots \otimes \mathcal{M}\tau\_n \xrightarrow{\mu} \mathcal{M}(\tau\_1 \times \cdots \times \tau\_n) \xrightarrow{\mathcal{M}N} \mathcal{M}\tau}$$

In case it only has one MK variable, the semantics is given by <sup>t</sup> ;M-N and in case it does not have any free variables the semantics is ;M-N.

The equational theory of the LL languages is the well-known theory of the simply-typed λ-calculus and the MK equational theory has been described, in graphical notation, by Fritz [12]. Something which is not obvious is understanding how they interact at their boundary. This is where M being a functor becomes relevant, since from functoriality it follows the two program equivalences:

Theorem 4. Let t, M and N be well-typed programs,


Proof.

$$\begin{aligned} &\left[\left(\lambda y.\texttt{sample } y\ \texttt{as } z\ \texttt{in } N\right)\left(\texttt{sample } t\ \texttt{as } x\ \texttt{in } M\right)\right] =\\ &\left[\left[t\right]; \mathcal{M}\left[M\right]; \mathcal{M}\left[N\right] = \left[t\right]; \mathcal{M}(\left[M\right]; \left[N\right]) =\\ &\left[\left[\texttt{sample } t\ \texttt{as } x\ \texttt{in } \left(\text{let } y = M\ \texttt{in } N\right)\right]\right] \end{aligned}$$

Theorem 5. Let t be a well-typed program,

sample <sup>t</sup> as <sup>x</sup> in <sup>x</sup> <sup>=</sup> t

Proof. sample <sup>t</sup> as <sup>x</sup> in <sup>x</sup> <sup>=</sup> <sup>t</sup> ;M(<sup>x</sup>) = <sup>t</sup> ;M(id) = <sup>t</sup> ;id <sup>=</sup> t

Furthermore, we also have a modularity property that can be easily proven:

Theorem 6. Let <sup>t</sup>, <sup>M</sup> and <sup>N</sup> be well-typed programs. If -<sup>M</sup> <sup>=</sup> -<sup>N</sup> then

> sample <sup>t</sup> as <sup>x</sup> in <sup>M</sup> <sup>=</sup> sample <sup>t</sup> as <sup>x</sup> in <sup>N</sup>

The expected compositionality of the semantics also holds:

Theorem 7. Let x<sup>1</sup> : τ1, ··· , x<sup>n</sup> : τ<sup>n</sup> t : τ and Γ<sup>i</sup> t<sup>i</sup> : τ<sup>i</sup> be well-typed terms. <sup>Γ</sup>1, ··· , Γ<sup>n</sup> <sup>t</sup>{−→<sup>x</sup>i/ −→t<sup>i</sup> } : <sup>τ</sup> = ( Γ<sup>1</sup> t<sup>1</sup> : τ<sup>1</sup> ⊗· · ·⊗ Γ<sup>n</sup> t<sup>n</sup> : τ<sup>n</sup> ); -<sup>Γ</sup>1, ··· , Γ<sup>n</sup> t : τ .

$$\frac{\begin{array}{c} \text{SuspST} \\ \Gamma \vdash u\_1 : \tau' \qquad \Gamma \vdash u\_2 : \tau' \qquad \Gamma, x : \tau' \vdash t : \tau \\ \hline \end{array} \begin{array}{c} \Gamma, x : \tau' \vdash t : \tau \qquad \Gamma \vdash u\_1 \equiv u\_2 : \tau' \\ \hline \end{array}}{\begin{array}{c} \Gamma \vdash t \{x/u\_1\} \equiv t \{x/u\_2\} : \tau \\ \hline \end{array}}$$

From this theorem we can conclude:

Corollary 1. The Subst rule shown above is sound with respect to the categorical semantics.

Lax monoidal functors, under the name applicative functors, are widely used in programming languages research[19]. They are often used to define embedded domain-specific languages (eDSL) within a host language. This suggests that from a design perspective the Markov kernel language can be thought of as an eDSL inside a linear language.

We have just shown that M being lax monoidal is sufficient to give semantics to our combined language, but what would happen if it had even more structure? If it were also full it would be possible to add a reification command<sup>3</sup>:

$$\frac{\mathcal{M}\Gamma \vdash\_{LL} t : \mathcal{M}\tau}{\Gamma \vdash\_{MK} \text{reif}\mathfrak{Y}(t) : \tau}$$

where MΓ is notation for every variable in Γ being of the form Mτ - , for some τ - . The semantics for the rule would be taking the inverse image of M. As we will show in the next section, there are some concrete models where M is full and some other models where it is not. Computationally, fullness of M can be interpreted as every program of type <sup>M</sup><sup>τ</sup> - Mτ being equal to a Markov kernel.

A property which is easier to satisfy is faithfulness, which is verified by both models in the next section. In this case the translation of the MK language into the LL language would be fully-abstract in the following sense:

Theorem 8. Let x : τ<sup>1</sup> - M : τ<sup>2</sup> and x : τ<sup>1</sup> - N : τ<sup>2</sup> be two well-typed MK programs. If <sup>M</sup> is faithful then sample <sup>y</sup> as <sup>x</sup> in <sup>M</sup> <sup>=</sup> sample <sup>y</sup> as <sup>x</sup> in <sup>N</sup> implies -<sup>M</sup> <sup>=</sup> -N.

Proof. sample <sup>y</sup> as <sup>x</sup> in <sup>M</sup> <sup>=</sup> sample <sup>y</sup> as <sup>x</sup> in <sup>N</sup> <sup>=</sup><sup>⇒</sup> idMτ<sup>1</sup> ;M-<sup>M</sup> <sup>=</sup> idMτ<sup>1</sup> ;M-<sup>N</sup> <sup>=</sup><sup>⇒</sup> -<sup>M</sup> <sup>=</sup> -N.

## 5 Concrete Models

In this section we show how existing models for both discrete as well as continuous probabilities fit within our formalism.

<sup>3</sup> The proposed rule breaks the substitution theorem, but it is possible to define a variant for it where this is not the case.

## 5.1 Discrete Probability

For the sake of simplicity we will denote the monoidal product of **CountStoch** as ×.

The probabilistic coherence space model of linear logic has been extensively studied in the context of semantics of discrete probabilistic languages[9].

Definition 10 (Probabilistic Coherence Spaces [9]). *A probabilistic coherence space (PCS) is a pair* (|X|,P(X)) *where* |X| *is a countable set and* <sup>P</sup>(X) ⊆ |X| → **<sup>R</sup>**<sup>+</sup> *is a set, called the* web*, such that:*


We can define a category **PCoh** where objects are probabilistic coherence spaces and morphisms X - <sup>Y</sup> are matrices <sup>f</sup> : <sup>|</sup>X|×|<sup>Y</sup> | → **<sup>R</sup>**<sup>+</sup> such that for every v ∈ P(X), (f v) ∈ P(Y ), where (f v)<sup>b</sup> = - <sup>a</sup>∈|A<sup>|</sup> <sup>f</sup>(a,b)va.

Definition 11. *Let* (|X|,P(X)) *and* (|Y |,P(Y )) *be PCS, we define* X ⊗ Y = (|X|×|Y |, {x ⊗ y | x ∈ P(X), y ∈ P(Y )}⊥⊥)*, where* (x ⊗ y)(a, b) = x(a)y(b)

Lemma 1. *Let* <sup>X</sup> *be a countable set, the pair* (X, {<sup>μ</sup> : <sup>X</sup> <sup>→</sup> **<sup>R</sup>**<sup>+</sup> <sup>|</sup> - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) <sup>≤</sup> 1}) *is a PCS.*

*Proof.* The first two points are obvious, as the Dirac measure is a subprobability measure and every subprobability measure is bounded above by the constant function μ1(x)=1.

To prove the last point we use the — easy to prove — fact that PX ⊆ PX⊥⊥. Therefore we must only prove the other direction. First, observe that, if μ ∈ {μ : <sup>X</sup> <sup>→</sup> **<sup>R</sup>**<sup>+</sup> <sup>|</sup> - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) <sup>≤</sup> <sup>1</sup>}, then we have μ(x)μ1(x) = -1μ(x) = μ(x) ≤ <sup>1</sup>, <sup>μ</sup><sup>1</sup> ∈ {<sup>μ</sup> : <sup>X</sup> <sup>→</sup> **<sup>R</sup>**<sup>+</sup> <sup>|</sup> - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) <sup>≤</sup> <sup>1</sup>}<sup>⊥</sup>.

Let <sup>μ</sup>˜ ∈ {<sup>μ</sup> : <sup>X</sup> <sup>→</sup> **<sup>R</sup>**<sup>+</sup> <sup>|</sup> - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) <sup>≤</sup> <sup>1</sup>}⊥⊥. By definition, - - μ˜(x) = μ˜(x)μ1(x) ≤ 1 and, therefore, the third point holds.

This lemma can be used to give semantics to probabilistic primitives. For instance, a fair coin is interpreted as a function coin : **N** → [0, 1] which is .5 at 0 and 1 and 0 elsewhere and is an element of P(**N**).

Lemma 2. *Let* X → Y *be a* **CountStoch** *morphism. It is also a* **PCoh** *morphism.*

## Theorem 9. *There is a lax monoidal functor* M : **CountStoch** → **PCoh***.*

*Proof.* The functor is defined using the lemmas above. Functoriality holds due to the functor being the identity on arrows. The lax monoidal structure is given by = id<sup>1</sup> and μX,Y = id<sup>X</sup>×<sup>Y</sup>

Lemma 3. *If* <sup>μ</sup> ∈ {<sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>|</sup> <sup>x</sup> ∈ M(X), y ∈ M(<sup>Y</sup> )}<sup>⊥</sup> *then for every* <sup>x</sup> <sup>∈</sup> <sup>X</sup> *and* <sup>y</sup> <sup>∈</sup> <sup>Y</sup> *,* <sup>μ</sup>(x, y) <sup>≤</sup> <sup>1</sup>*.*

*Proof.* If there were such indices such that μ(x1, y1) > 1 then -<sup>μ</sup>(x, y)(δ<sup>x</sup><sup>1</sup> <sup>⊗</sup> <sup>δ</sup><sup>y</sup><sup>1</sup> )(x, y) > μ(x1, y1)(δ<sup>x</sup><sup>1</sup> <sup>⊗</sup>δ<sup>y</sup><sup>1</sup> )(x1, y1) = <sup>μ</sup>(x1, y1) <sup>&</sup>gt; <sup>1</sup>, which is a contradiction.

Lemma 4. *Let* X *and* Y *be two countable sets, then*

$$\mathcal{M}X \otimes \mathcal{M}Y = \left( X \times Y, \{ \mu : X \times Y \to \mathbb{R}^+ \mid \sum\_{x \in X} \sum\_{y \in Y} \mu(x, y) \le 1 \} \right) = 1$$
  $\mathcal{M}(X \times Y)$ .

*Proof.* By the lemma above it follows that if we have a joint probability distribution <sup>μ</sup>˜ over <sup>X</sup> <sup>×</sup> <sup>Y</sup> and an element <sup>μ</sup> ∈ {<sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>|</sup> <sup>x</sup> ∈ M(X), y ∈ M(<sup>Y</sup> )}<sup>⊥</sup> -- then <sup>μ</sup>(x, y)˜μ(x, y) <sup>≤</sup> -<sup>μ</sup>˜(x, y) <sup>≤</sup> <sup>1</sup>.

Theorem 10. *Both and* μX,Y *are isomorphisms.*

*Proof.* Since is the identity morphism, it is trivially an isomorphim. The morphisms μX,Y being an isomorphism is a direct consequence of the lemmas above.

## Theorem 11. *The functor* M *is full.*

Both results above can be directly used to enhance the syntax of the combined language. From Theorem <sup>10</sup> we can conclude that elements of type <sup>M</sup>(τ<sup>1</sup> <sup>×</sup> <sup>τ</sup>2), by projecting their marginal distributions, can be manipulated as if they had type <sup>M</sup>τ<sup>1</sup> ⊗ Mτ2. Something to note is that when we do this marginalization process we lose potential correlations between the elements of the pair.

#### 5.2 Continuous Probability

In order to accommodate continuous distributions we can use regularly ordered Banach spaces, whose detailed definition goes beyond the scope of this paper.

Definition 12 ([8]). *The category* **RoBan** *has regularly ordered Banach spaces as objects and regular linear functions as morphisms.*

Theorem 12. *There is a lax monoidal functor* M : **Kern** → **RoBan***.*

*Proof.* The functor acts on objects by sending a measurable space to the set of signed measures over it, which can be equipped with a **RoBan** structure. On morphisms it sends a Markov kernel <sup>f</sup> to the linear function <sup>M</sup>(f)(μ) = f dμ.

The monoidal structure of **RoBan** satisfies the universal property of tensor products and, therefore, we can define the natural transformation μX,Y : <sup>M</sup>(X)⊗M(<sup>Y</sup> ) → M(<sup>X</sup> <sup>×</sup><sup>Y</sup> ) as the function generated by the bilinear function <sup>M</sup>(X);M(<sup>Y</sup> ) - <sup>M</sup>(<sup>X</sup> <sup>×</sup> <sup>Y</sup> ) which maps a pair of distributions to its product measure. The map is, once again, equal to the identity function.

The commutativity of the lax monoidality diagrams follows from the universal property of the tensor product: it suffices to verify it for elements <sup>μ</sup><sup>A</sup> <sup>⊗</sup>μ<sup>B</sup> <sup>⊗</sup>μ<sup>C</sup> . 104 P. H. A. de Amorim

In **RoBan** the uniform distribution over the interval [0, 1] is an element of M**R**, meaning that it can soundly interpret a · -LL uniform : M**R** primitive.

Even though M looks very similar to the discrete case, it follows from a wellknown theorem from functional analysis that the functor is not strong monoidal, meaning that there are joint probability distributions (elements of <sup>M</sup>(<sup>A</sup> <sup>×</sup> <sup>B</sup>)) that cannot be represented as an element of the tensor product <sup>M</sup>(A) ⊗ M(B) and, as such, programs of type <sup>M</sup>(A×B) must be manipulated in MK language, as shown in Example 3.

## 6 Beyond Probability

We have seen that this new resource interpretation is present in different models of linear logic models for probabilistic programming. In this section we show that this model can be generalized to commutative effects, i.e. effects where the program equation Commutativity below holds. Categorically, these effects are captured by monoidal monads<sup>4</sup>. Due to length issues, we will not fully detail the definition of monoidal monads, but we suggest the interested reader to read Seal [23].

Commutativity <sup>Γ</sup> <sup>t</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup> <sup>Γ</sup> <sup>t</sup><sup>2</sup> : <sup>τ</sup><sup>2</sup> Γ, x : <sup>τ</sup>1, y : <sup>τ</sup><sup>2</sup> u : τ let <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>1</sup> in (let <sup>x</sup><sup>2</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> in <sup>u</sup>) <sup>≡</sup> let <sup>x</sup><sup>2</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> in (let <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>1</sup> in <sup>u</sup>) : <sup>τ</sup>

Definition 13 ([23]). Let (**C**, <sup>⊗</sup>, I) be a monoidal category and (T, η, μ) a monad over it. The monad T is called monoidal if it comes equipped with a natural transformation <sup>κ</sup>X,Y : T X <sup>⊗</sup> T Y <sup>→</sup> <sup>T</sup>(<sup>X</sup> <sup>⊗</sup> <sup>Y</sup> ) making certain diagrams commute

For probability monads the transformation κ corresponds to forming the product probability distribution and, more generally, this can be thought of a program that runs both of its (effectful) inputs and pairs the outputs.

Every monad give rise to the interesting categories **C**<sup>T</sup> and **C**<sup>T</sup> which are, respectively, the Kleisli category and Eilenberg-Moore category. The objects of **C**<sup>T</sup> are the same as **C** and morphisms between A and B are **C** morphisms <sup>A</sup> <sup>→</sup> T B, with the identity morphism being equal to the unit <sup>η</sup> of the monad and composition is given by f; g = f; T g; μ.

The objects of the category **C**<sup>T</sup> are pairs (X, x), where X is a **C** object and <sup>x</sup> : T X <sup>→</sup> <sup>X</sup> is a **<sup>C</sup>** morphism such that <sup>μ</sup>; <sup>x</sup> <sup>=</sup> T x; <sup>x</sup> and <sup>η</sup>; <sup>x</sup> <sup>=</sup> idX, and morphisms between objects (X, x) and (Y,y) are **<sup>C</sup>** morphisms <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> such that x; f = T f; y.

For every monad <sup>T</sup> there is a canonical inclusion functor <sup>ι</sup> : **<sup>C</sup>**<sup>T</sup> <sup>→</sup> **<sup>C</sup>**<sup>T</sup> which maps <sup>X</sup> to (TX, μ) and <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> to T f; <sup>μ</sup><sup>Y</sup> .

Theorem 13 ([5]). The functor ι is full and faithful.

<sup>4</sup> Monoidal monads are equivalent to commutative monads, which is the nomenclature usually used in the context of programming languages semantics.

As we explain in Appendix C, assuming enough structure on the category **C** we can show that the triple (**C**<sup>T</sup> , **<sup>C</sup>**<sup>T</sup> , ι) is a model to the MK+LL language and we can bring our new resource interpretation of linear logic to other commmutative effects.

An illustrative example is the powerset monad <sup>P</sup> : **Set** <sup>→</sup> **Set** which is monoidal and since **Set** has the necessary structure, the triple (**C**<sup>P</sup> , **<sup>C</sup>**<sup>P</sup> ,P) is a model to our language and can be used to give semantics to non-deterministic computation.

In the context of commutative effects other than randomness, the syntax sample t as x in M does not make as much sense, in which case we can use the syntax observe t<sup>i</sup> as x<sup>i</sup> in M instead. Once again, operationally, the programs t<sup>i</sup> are fully executed, the values are bound to x<sup>i</sup> in M which is then executed.

Furthermore, other effects have other relevant effectful operations and, therefore, we can assume that there is a set of operations in the MK language that are interpreted in the Kleisli category and can be transported to LL using observe, similar to how it was done in the probabilistic case.

For the non-deterministic case we can assume the existence of typing rules for non-deterministic choice and failure:

$$\frac{\begin{array}{c} \text{CHOICE} \\ \Gamma \vdash\_{MK} t\_1 : \tau \\ \hline \Gamma \vdash\_{MK} t\_1 \oplus t\_2 : \tau \end{array}}{\begin{array}{c} \Gamma \vdash\_{MK} t\_1 \oplus t\_2 : \tau \\ \hline \end{array}} \qquad \frac{\begin{array}{c} \text{NUL} \\ \hline \Gamma \vdash\_{MK} 0\_\tau : \tau \\ \hline \end{array}}{\begin{array}{c} \Gamma \vdash\_{MK} 0\_\tau : \tau \\ \hline \end{array}}$$

satisfying the expected equations and interpreted using set-theoretic union and the empty set, respectively.

A similar connection between linear logic and monoidal monads has been made by Benton and Wadler[4], where they want to relate Moggi's monadic λ-calculus with linear logic by showing that if a monad is monoidal and the category has equalizers and coequalizers, then the Eillenberg-Moore category is a model of linear logic.

## 7 Related Work

Semantics of Probabilistic Programming Ehrhard et al. [11,10] have defined a model of linear logic **CLin** which can be used to interpret a higher-order probabilistic programming language. They have used the call-by-name translation of intuitionistic logic into linear logic <sup>A</sup> <sup>→</sup> <sup>B</sup> =!<sup>A</sup> - B to give semantics to their language. The authors extend their language with a call-by-value let syntax which makes it possible to reuse sampled values. In order to give semantics to this new language they introduce a new category **CLin<sup>m</sup>** which can interpret this new operator, at the cost of complicating their model.

Because there is an analogous proof of Theorem <sup>12</sup> with the category **CLin** replacing **RoBan**, we can use their original, simpler, model to interpret our language, while not needing to use the linear logic exponential to interpret nonlinear programs.

#### 106 P. H. A. de Amorim

Dahlqvist and Kozen [8] have defined a category of partially ordered Banach spaces and shown that it is a model of intuitionistic linear logic. An important difference from their approach and the one mentioned above is that they embrace variable linearity as part of their syntax. As we argued in this paper, we believe that the syntactic restriction of linearity they have used is not adequate for the purposes of probabilistic programming. They deal with this limitation by adding primitives to their languages which, by using the results of Section 5, could be programmed using the MK language.

Quasi Borel spaces [15] are a conservative extension of **Meas** that are Cartesian closed and have a commutative probability monad. The drawback of this model is that it is still not as well understood as its measure-theoretic counterpart, and there are theorems from probability theory used to reason about programs that may not hold in the category of quasi Borel spaces **QBS**.

Recently, Geoffroy [13] has made progress in connecting linear logic and quasi Borel Spaces by showing that a certain subcategory of the Eillenberg-Moore category for the probability monad in **QBS** is a model of classical linear logic, which we see as an instance of our model where the MK language can have higher-order functions as well.

Call-by-Push-Value The idea of having two distinct type systems that are connected by a functorial layer is reminiscent of Call-by-Push-Value (CBPV) [17], which has a type system for values and a type system for computations that are connected by an adjunction. In recent work, Ehrhard and Tasson [24] use the Eilenberg-Moore adjunction of the linear logic exponential ! to give semantics to a calculus that can interpret lazy and eager probabilistic computation, allowing for the interpretation of an eager let operator which is operationally similar to our sample construct. However, the existence of the let operator depends on properties of the ! that are unknown to hold for continuous distributions, while our semantics can naturally deal with continuous distributions as we have shown in Section 5.

Furthermore, the exponential which lies at the center of their approach is, semantically, hard to work with and does not have any clear connections to probability theory, making it unlikely that their semantics can be seen as a bridge between the Markov and linear semantics, which is the case for the models presented in Section 5.

Goubault-Larrecq [14] has defined a CBPV domain semantics to a language that mixes probability and non-determinism, a long-standing challenge in the theory of programming languages. His focus is in understanding how to make probability interact with non-determinism in a sound way. He studies the fullabstraction of his semantics but does not deal with connections to linear logic.

Acknowledgements The support of the National Science Foundation under grant CCF-2008083 is gratefully acknowledged. I would also like to thank Arthur Azevedo de Amorim, Justin Hsu, Michael Roberts, Christopher Lam and Deepak Garg for their useful comments on earlier versions of this paper.

## A Typing Rules and Denotational Semantics LL and MK





Fig. 6: Denotational semantics for MK

$$\begin{array}{c} \begin{array}{c} \text{AXIOM} \\ \tau \stackrel{id\_{\tau}}{\longrightarrow} \tau \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \text{TENSOR} \\ \frac{\tau\_{1}}{\longrightarrow} \stackrel{t\_{1}}{\longrightarrow} \tau\_{2} \stackrel{t\_{2}}{\longrightarrow} \underline{\tau\_{2}} \\ \begin{array}{c} \tau\_{1} \otimes \underline{\tau\_{2}} \end{array} \\ \end{array} \end{array}$$

$$\begin{array}{cc} \text{LETTension} & \text{A posteriori} \\ \hline I\_1 \xrightarrow{t} \underline{\tau\_1} \otimes \underline{\tau\_2} & I\_2 \otimes \underline{\tau\_1} \otimes \underline{\tau\_2} \xrightarrow{u} \underline{\tau\_2} \\ I\_1 \otimes I\_2 & \xrightarrow{(id \otimes \mathbf{t}); u} \underline{\tau} \\ \end{array} \qquad \begin{array}{cc} \text{ALSTMCTION} \\ \hline I \otimes \underline{\tau\_1} \xrightarrow{t} \underline{\tau\_2} \\ I \xrightarrow{\underline{\operatorname{cw}(\{\mathbbm{t}\})}} \underline{\tau\_1} \xrightarrow{\underline{\tau\_2}} \\ \underline{\tau\_1} \end{array}$$

$$\frac{\begin{array}{c} \text{APPLICATION} \\ \hline \end{array}}{\begin{array}{c} \text{T}\_1 \xrightarrow{t} \underline{\tau\_1} \xrightarrow{\sim} \underline{\tau\_2} \quad \quad \quad \Gamma\_2 \xrightarrow{u} \underline{\tau\_1} \\ \hline \end{array}}$$

Fig. 7: Denotational semantics for LL

## B Commutative Diagrams

Fig. 8: Lax monoidal diagrams

## C Monoidal Monads and Their Algebras

An important theorem from the categorical probability literature is that Markov categories are an abstraction of programming in the Kleisli category of monoidal affine monads, where affinity means that T1 ∼= 1.

Theorem 14 ([12]). *Let* (**C**, <sup>×</sup>, 1) *be a cartesian category and* <sup>T</sup> : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** *<sup>a</sup> monoidal (affine) monad. The Kleisli category* **C**<sup>T</sup> *is a Markov category.*

The monoidal product of **C**<sup>T</sup> is × with unit 1, the copy operation is given by ΔX; η<sup>X</sup> : X → T(X × X) and the deletion operation is given by T1 ∼= 1 and 1 being terminal.

Furthermore, under certain conditions, the Eilenberg-Moore category **C**<sup>T</sup> for monoidal monads is symmetric monoidal closed. The monoidal unit is given by T I, the monoidal product is given by the coequalizer depicted in Figure 9 and the closed struture is given by the equalizer depicted in Figure 10.

Theorem 15. *Let* **C** *be a symmetric monoidal closed category with equalizers, reflexive co-equalizers and* <sup>T</sup> : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** *a monoidal monad. The category* **<sup>C</sup>**<sup>T</sup> *is also symmetric monoidal closed.*

$$T^\sharp(TX \otimes TY) \xleftarrow{T^\kappa} TT(X \otimes Y) \xleftarrow{\mu} T(X \otimes Y) \xleftarrow{} X \otimes\_T Y$$

$$X \multimap\_{T} Y \longrightarrow X \multimap\_{Y} Y \xrightarrow{s \quad Y \multimap\_{T} X \multimap\_{T} Y} \sum\_{x \multimap\_{Y}}^{id\_{TX} \multimap\_{Y}} T X \multimap\_{Y} Y$$

Fig. 10: Closed Structure in **C**<sup>T</sup>

Even though, in general, in order to define the monoidal product one requires a coequalizer, for our purposes we are only interested in products of the form T A ⊗<sup>T</sup> T B which, luckily, are easier to characterize, since the equality T X ⊗<sup>T</sup> T Y = T(X ⊗ Y ) holds [23].

In this case the lax monoidal transformations μX,Y : T X ⊗<sup>T</sup> T Y → T(X ⊗Y ) and : F I → F I are simply the identity morphisms. Besides, by using the universal properties of coequalizers it is possible to show the equality α˜TX,T Y,T Z = αX,Y,Z, where α˜ is the associator for the monoidal product ⊗<sup>T</sup> .

Theorem 16. *Let* **C** *be a symmetric monoidal category with reflexive co-equalizers and* T : **C** → **C** *a monoidal monad. The triple* (ι, μ, ) *is a lax monoidal functor.*

*Proof.* The proof follows by unfolding the definitions.

## D Proofs

Theorem 3. Let Γ, x : <sup>τ</sup><sup>1</sup> <sup>t</sup> : <sup>τ</sup> and <sup>Δ</sup> u : τ<sup>1</sup> be well-typed terms, then Γ,Δ <sup>t</sup>{x/u} : <sup>τ</sup>

Proof. The proof follows by structural induction on the typing derivation Γ, x : <sup>τ</sup><sup>1</sup> t : τ :


Theorem 7. Let <sup>x</sup><sup>1</sup> : <sup>τ</sup>1, ··· , x<sup>n</sup> : <sup>τ</sup><sup>n</sup> <sup>t</sup> : <sup>τ</sup> and <sup>Γ</sup><sup>i</sup> t<sup>i</sup> : τ<sup>i</sup> be well-typed terms. - <sup>Γ</sup>1, ··· , Γ<sup>n</sup> <sup>t</sup>{−→<sup>x</sup>i/ −→t<sup>i</sup> } : <sup>τ</sup> = ( <sup>Γ</sup><sup>1</sup> t<sup>1</sup> : τ<sup>1</sup> ⊗· · ·⊗ <sup>Γ</sup><sup>n</sup> t<sup>n</sup> : τ<sup>n</sup> ); Γ1, ··· , Γ<sup>n</sup> - t : τ .

Proof. The proof follows by induction on the typing derivation of t.


## References

1. de Amorim, A.A., Gaboardi, M., Hsu, J., Katsumata, S.y.: Probabilistic relational reasoning via metrics. In: Symposium on Logic in Computer Science (LICS) (2019)


#### 112 P. H. A. de Amorim

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Formal Logic for Formal Category Theory**

Max S. New1,2() and Daniel R. Licata<sup>2</sup>

<sup>1</sup> University of Michigan, Ann Arbor, USA maxsnew@umich.edu <sup>2</sup> Wesleyan University, Middletown, USA dlicata@wesleyan.edu

**Abstract.** We present a domain-specific type theory for constructions and proofs in category theory. The type theory axiomatizes notions of category, functor, profunctor and a generalized form of natural transformations. The type theory imposes an ordered linear restriction on standard predicate logic, which guarantees that all functions between categories are functorial, all relations are profunctorial, and all transformations are natural by construction, with no separate proofs necessary. Important category-theoretic proofs such as the Yoneda lemma and Co-yoneda lemma become simple type-theoretic proofs about the relationship between unit, tensor and (ordered) function types, and can be seen to be ordered refinements of theorems in predicate logic. The type theory is sound and complete for a categorical model in *virtual equipments*, which model both internal and enriched category theory. While the proofs in our type theory look like standard set-based arguments, the syntactic discipline ensure that all proofs and constructions carry over to enriched and internal settings as well.

## **1 Introduction**

Category theory is a branch of mathematics that studies higher-dimensional typed algebraic structures. Originally developed for applications to homological algebra, it was quickly discovered that categorical structures were common in logic and computer science. Formal systems like logics, type theories and programming languages typically have sound and complete models given by notions of structured categories [31,30,34]. This Curry-Howard-Lambek correspondence applies to simply typed lambda calculus [30], computational lambda calculus [34], linear logic [24] dependent type theory [14,45], and many other type theories designed based on category-theoretic semantics. The syntax of a type theory should present an initial object in its category of models, a categorytheoretic reformulation of logical soundness and completeness.

While this research program has been quite successful, category-theoretic notions can be overwhelming for beginners. In a traditional set-theoretic formulation, notions such as adjoint functors and limits produce a proliferation of "naturality" and "functoriality" side-conditions that must be discharged. For example, when constructing an adjoint pair of functors between two categories, a na¨ıve approach would define all of the data of the action on objects, action on arrows, prove the functoriality of such actions, as well as construct two families of transformations, prove they are natural and then finally proving a pair of equalities relating compositions of natural transformations. Carrying out these proofs explicitly is quite tedious and many newcomers are left with the impression that category theory is full of long, but ultimately trivial constructions. This complexity is compounded when moving from ordinary category theory to enriched and internal category theory, where constructions must be additionally proven continuous, monotone, etc, in addition to natural or functorial. However, these generalizations are often exactly what is needed for programming language applications; for example, domain-, metric- and step-index-enriched categories have been used to model recursive programming languages and internal categories have been used to model parametricity and gradual typing [53,9,44,36].

Fortunately, the tools of category theory itself can be employed to simplify this complexity, specifically the tools of *higher* category theory. As an analogy in differential calculus, when an adept analyst writes down a function, they do not expand out the -−δ definition of continuity for a function and proceed from first principles, but rather use certain *syntactic principles* for defining functions that are continuous by construction — e.g. that composition of continuous functions is continuous. Similar principles apply to category theory itself: functors and natural transformations are closed under composition and whiskering operations, and experienced category theorists rely on these syntactic principles to eliminate the tedium of explicit proofs. In the case of category theory, these principles can be formalized using algebraic structures such as 2-categories, bicategories, Yoneda structures, (virtual) double categories, pro-arrow equipments [6,56,49,32,17], an approach known as *formal category theory*. In these structures, rather than defining notions of category, functor and natural transformation from first principles, they are axiomatized in a manner similar to how a category axiomatizes a notion of space and homomorphism. Proofs in formal category theory apply to enriched and internal settings, which are instances of the formal axioms. A downside is that these algebraic structures are quite complicated, and practitioners typically employ either an algebraic combinator syntax (formalized in [18]) or a 2-dimensional diagrammatic language that can be quite beautiful and elegant, but is also somewhat removed from the traditional formulation of category theory in terms of sets and functions.

In this work, we apply the techniques of categorical logic to define a more familiar logical syntax for carrying out constructions and proofs in formal category theory. We call the resulting theory *virtual equipment type theory* (VETT) as (hyperdoctrines of) *virtual equipments* [32,17], a particular semantic model of formal category theory, provide a sound and complete notion of model for the theory. VETT provides syntax for categories, functors, profunctors, and natural transformations, which are defined using familiar term syntax and βη reasoning principles for λ-functions, bound variables, tuples, etc. By adhering to a *syntactic discipline*, the logic guarantees that all functor terms are automatically functorial, and all natural transformation terms are natural. More specifically, the syntax for transformations is a kind of *indexed, ordered linear* lambda calculus, where the indexing ensures that transformations are correctly natural and the ordering and linearity ensure that the proofs are valid in a large class of enriched and internal categories, such as enrichment in a non-symmetric monoidal category. VETT provides an alternative to algebraic and string-diagram syntaxes for working with virtual equipments, similar to how the lambda calculus provides an alternative to categorical combinators and string diagram calculi for cartesian closed categories.

The syntax of VETT is an indexed, ordered linear, proof-relevant variant of predicate logic over a unary type theory. Just as a predicate logic has a notion of type, term, relation and implication, VETT is based on four analogous categorytheoretic concepts: categories, functors, profunctors and natural transformations of profunctors. Categories are treated like types, and the unary functors we consider in this paper are each represented by a term whose type is a category and whose one free variable ranges over a category. The analog of a relation is a profunctor (defined below), which is written like a set with free category variables. Like the restriction to unary functors, we restrict to profunctors with two free variables. The logic is proof-relevant in that the implications of relations are generalized to natural transformations of profunctors, and we use a λ-calculus notation to describe these "proof terms". This analogy to predicate logic can be made formal: any construction in VETT can be erased to a corresponding construction or proof in predicate logic, as sets, functions, relations, and implication of relations define a (somewhat degenerate) virtual equipment.

While the restricted syntax developed in this paper does not express some important concepts such as functor categories or opposite categories, the restriction is natural in that it corresponds exactly to virtual equipments, a wellunderstood notion of model that can express a great deal of fundamental results and constructions in category theory [43,47]. Moreover, we can work around these unary/binary restrictions to some extent by viewing the type theory as a domain-specific language embedded in a metalanguage. For example, while we cannot talk about functor categories, we can state a theorem that quantifies over functors using the meta-language's "external" universal quantifier (which does not have automatic functoriality/naturality properties). To support this, VETT includes a third layer, an extensional dependent type theory in the style of Martin-L¨of type theory. All of our ordered predicate logic judgments are also indexed by a context from this dependent type theory, and the type theory includes universe types for categories, functors, profunctors and natural transformations. This allow us to formalize theorems the object logic is too restrictive to encode, analogous to 2-level [51,2,39] or indexed type theories [27,15,52,29].

While we emphasize the applications to enriched and internal category theory in this work, there is potential for more direct application to programming language semantics. Ordinary predicate logic is the foundation for proof-theoretic presentations of logical relations, such as Abadi-Plotkin logic for parametricity and LSLR and Iris for step-indexed logical relations proofs [40,20,28]. We conjecture that VETT might similarly serve as the foundation for a logic of ordered structures, which abound in applications: rewriting and approximation relations can both be modeled as orderings and logical relations involving these structures

are proven to respect orderings; operational logical relations must be downwardclosed and approximation relations should satisfy transitivity. Just as LSLR and Iris release the user from the syntactic burden of explicit step-indexing, VETT may be used to release the user from the syntactic burden of proving downwardclosure or transitivity side-conditions. Additionally, VETT may serve as the basis of a future domain specific proof assistant for category-theoretic proofs. To pilot-test this, we have formalized the syntax of VETT in Agda 2.6.2.2, using the rewrite mechanism to make VETT's substitution and β-reduction rules definitional equalities.<sup>1</sup> We have used this lightweight implementation to check a number of examples.

**Basics of Profunctors.** While we assume the reader has some background knowledge of category theory, we briefly define profunctors, which are not included in many introductory texts. Recall that a category has a collection of objects and morphisms with identity and composition, and a functor <sup>F</sup> : <sup>→</sup> - is a function on objects and a function on morphisms that preserves identity and composition. A category can be thought of as a generalization of a preordered set, which has a set of elements and a binary *relation* on its objects satisfying reflexivity and transitivity. A category is then a *proof-relevant preorder*, where morphisms are the proofs of ordering, and the reflexivity and transitivity proofs must satisfy identity and unit equations. A functor is then a *proof-relevant monotone function*. Given categories <sup>C</sup> and <sup>D</sup>, a profunctor <sup>R</sup> from <sup>C</sup> to <sup>D</sup>, written R : - is a functor <sup>R</sup> : <sup>o</sup> <sup>×</sup> - <sup>→</sup> Set<sup>2</sup>. Because a profunctor outputs a Set rather than a proposition, it is itself a *proof-relevant relation*. Thinking of categories as proof-relevant preorders, functoriality says that the profunctor is downward-closed in and upward-closed in -. Given profunctors R, S : - -, a homomorphism from R to S is a natural transformation, which in the preordered setting is simply an implication of relations.

Profunctors are very useful for formalizing category theory, but an additional reason we make them a basic concept of VETT is that they allow us to give a *universal property* for the type of "morphisms in a category ". This is analogous to how the J elimination rule for the identity type in Martin-L¨of type theory gives a universal property for morphisms in a groupoid (the special case of a category where all morphisms are invertible) [26,5,50]. The reason profunctors are useful for this purpose is that, for any category , Hom : - is a profunctor. On preorders this is just the preorder's ordering relation itself. Moreover, the hom profunctor is the unit for a composition of profunctors RS which is defined as a *co-end*. The composition of profunctors is a generalization of the composition of relations, and just as the equality relation is the identity for the composition of relations, the hom profunctor is the identity for this composition. The unit law for the hom profunctor can be seen as a "morphism induction" principle, analogous to the "path induction" used in homotopy type theory (though in this paper we consider only ordinary 1-dimensional categories, not higher generalizations).

<sup>1</sup> https://github.com/maxsnew/virtual-equipments/blob/master/agda/STC.agda

<sup>2</sup> <sup>o</sup> is the notation we use for the opposite category of

**Outline.** In Section 2 we introduce the syntax of VETT. In Section 3 we demonstrate how to use our syntax for formal category theory. In Section 4, we develop some model theory for VETT, including a sound and complete notion of categorical model and sound interpretation in virtual equipments modeling ordinary, enriched and internal category theory. In Section 5, we discuss related type theories and potential extensions.

## **2 Syntax of VETT**

In Figure 1 we give a table summarizing the relationship between the judgments and connectives of higher-order predicate logic with our ordered variant. Due to the incorporation of variance, some unordered concepts generalize to multiple different ordered notions. For instance, covariant and contravariant presheaf categories generalize the power set. Further, because we only have binary relations rather than relations of arbitrary arity, we have only restricted forms of universal and existential quantification which come combined with implications and conjunctions.


**Fig. 1.** Analogy between Higher-Order Logic and VETT Judgments and Connectives

The syntactic forms of VETT are given in Figure 2. First, we have categories, which are analogous to sorts in a first-order theory. We have M a base sort, product and unit sorts, as well as the graph of a profunctor and the negative and positive presheaf categories. Next, objects a, b, c are the syntax for the functors between categories. We call them objects rather than functors, because in type-theoretic style, a functor is viewed as a "generalized object" parameterized by an input variable α : . Next, sets P, Q, R are the syntax for sets. These sets denote profunctors, i.e., a categorification of relations. Similar to functors, rather than writing profunctors as functions <sup>o</sup> <sup>×</sup> -→ Set, we write them as sets with a contravariant variable α : and a covariant variable β : -. The sets we can define are the Hom-set, the tensor and internal hom, as well as products of sets, profunctors applied to two objects and elements of positive and negative presheaves. Finally we have elements of sets, which correspond to natural transformations of multiple inputs, where again we view natural transformations valued in a profunctor as generalized elements of profunctors.

After these forms we have types and terms, which represent the meta-language that we use to talk about categories/profunctors/natural transformations. In addition to standard dependent type theory with Π and Σ and identity types, we have universes of categories, functors, profunctors and natural transformations.

Finally we have several forms of context which are used in the theory. The contexts Γ of term variables with their types are as usual; we write "Γ type context" to indicate that a context is well-formed. We name the remaining contexts after the judgements that they are used by. The set contexts Ξ, which will be used to type-check sets, contain object variables with their categories. The two forms of set context are α : , containing one variable that can be used both contravariantly and covariantly, and α : ; β : -, containing a contravariant variable α and covariant variable β. Finally, the transformation contexts Φ contain element variables with their sets, alternating with those sets' object variables with their categories. A typical Φ has the shape

$$\alpha\_1: \mathbb{C}\_1, x\_1: R\_1(\alpha\_1, \alpha\_2), \alpha\_2: \mathbb{C}\_2, x\_2: R\_2(\alpha\_2, \alpha\_3), \dots, R\_n(\alpha\_n, \alpha\_{n+1}), \alpha\_{n+1}: \mathbb{C}\_{n+1}$$

and represents the composition of the "relations" R1, R2, R3,...,Rn. We write d−(Φ) for the first category variable in Φ (which we regard as the negative or contravariant position), d<sup>+</sup>(Φ) for the last category variable in Φ (which we regard as the positive or covariant position) and use the notation d±Ξ with the same meaning. We write <sup>Φ</sup><sup>1</sup> - Φ<sup>2</sup> for the append of two transformation contexts, which is only well-formed when the last variable in Φ<sup>1</sup> is equal to the first variable in Φ2. Formal inductive definitions are in the appendix, but intuitively:

$$\begin{array}{c} d^{-}(\alpha\_{1}:\mathbb{C}\_{1},x\_{1}:R\_{1}(\alpha\_{1},\alpha\_{2}),\ldots,x\_{n}:R\_{n}(\alpha\_{n},\alpha\_{n}),\alpha\_{n+1}:\mathbb{C}\_{n+1}) = \alpha\_{1}:\mathbb{C}\_{1} \\\ d^{+}(\alpha\_{1}:\mathbb{C}\_{1},x\_{1}:R\_{1}(\alpha\_{1},\alpha\_{2}),\ldots,x\_{n}:R\_{n}(\alpha\_{n},\alpha\_{n}),\alpha\_{n+1}:\mathbb{C}\_{n+1}) = \alpha\_{n+1}:\mathbb{C}\_{n+1} \\\ (\Phi\_{1},\beta:\mathbb{D})\vee(\beta:\mathbb{D},\Phi\_{2}) &= \Phi\_{1},\beta:\mathbb{D},\Phi\_{2} \end{array}$$

Next, we overview our basic judgement forms. We have


Categories , , - ::= -<sup>M</sup> | <sup>×</sup> <sup>|</sup> <sup>|</sup> - <sup>α</sup>;<sup>β</sup> <sup>P</sup> | P<sup>−</sup> | P<sup>+</sup> Objects a, b, c ::= <sup>α</sup> <sup>|</sup> Ma <sup>|</sup> (a, b) <sup>|</sup> () <sup>|</sup> <sup>π</sup>i<sup>a</sup> <sup>|</sup> (a−, a+, s) <sup>|</sup> <sup>π</sup>−<sup>a</sup> <sup>|</sup> <sup>π</sup>+<sup>a</sup> <sup>|</sup> λα : .R Sets P, Q, R ::= <sup>a</sup> <sup>→</sup> <sup>b</sup> <sup>|</sup> <sup>P</sup> <sup>∃</sup><sup>β</sup> <sup>Q</sup> <sup>|</sup> P ∀<sup>β</sup> <sup>Q</sup> <sup>|</sup> <sup>S</sup> <sup>∀</sup><sup>α</sup> R <sup>|</sup> <sup>1</sup> <sup>|</sup> <sup>P</sup> <sup>×</sup> <sup>Q</sup> | M(a; b) | b ∈ a | a b Elements s, t, u ::= <sup>x</sup> <sup>|</sup> ind→(α.t, b1, s, b2) <sup>|</sup> id<sup>b</sup> <sup>|</sup> ind(x, β, y.r; <sup>s</sup>) <sup>|</sup> (s, b, t) <sup>|</sup> s <sup>a</sup> <sup>t</sup> <sup>|</sup> <sup>λ</sup>(x, α).s <sup>|</sup> <sup>s</sup> <sup>a</sup> t <sup>|</sup> <sup>λ</sup>(α, x).s <sup>|</sup> <sup>π</sup>i<sup>s</sup> <sup>|</sup> (s1, s2) <sup>|</sup> () <sup>|</sup> <sup>π</sup>e<sup>a</sup> <sup>|</sup> <sup>M</sup><sup>b</sup> Type A, B, C ::= ... <sup>|</sup> SmallCat <sup>|</sup> Cat <sup>|</sup> Fun <sup>|</sup> Prof | ∀<sup>α</sup> : .R Term L, M, N ::= ... <sup>|</sup> <sup>|</sup> λα : .a <sup>|</sup> <sup>λ</sup>(<sup>α</sup> : ; <sup>β</sup> : ).R <sup>|</sup> λα.t Type Context Γ,Δ ::= · | Γ, X : A Set Context Ξ,Z ::= <sup>α</sup> : <sup>|</sup> <sup>α</sup> : ; <sup>β</sup> : Trans. Context Φ, Ψ ::= <sup>α</sup> : <sup>|</sup> Φ, x : P, β :

**Fig. 2.** VETT Syntactic Forms

must be parameterized by the same contravariant and covariant object variables. To ensure this, we use a coercion operation Φ from transformation contexts to set contexts that erases everything in the context but the leftmost and right-most object variables (α : <sup>=</sup> <sup>α</sup> : and <sup>Φ</sup> <sup>=</sup> <sup>d</sup><sup>−</sup>(Φ); <sup>d</sup><sup>+</sup>(Φ)).

**–** Meta-language types and terms: Γ - A Type and Γ - M : A as in standard dependent type theory.

The variable rules for objects and elements are

$$\overline{F \mid \alpha: \mathbb{C} \vdash \alpha: \mathbb{C}} \qquad \overline{F \mid \alpha: \mathbb{C}, x: R, \beta: \mathbb{D} \vdash x: R}$$

As when using variables in linear logic, the latter rule applies only when the context contains a single set R. All syntactic forms typed in context admit an action of substitution. For types and terms, this is as usual. Objects α : a : can be substituted for object variables β : in other objects. We can also substitute objects into *sets*, that is, if we have a set P parameterized by a contravariant variable α : and a covariant variable β : -, then we can substitute objects a : and b : for these variables P[a/α; b/β]. This generalizes the ordinary precomposition of a relation by a function. Semantically this is the "restriction" of a profunctor along two functors, which is just composition of functors if a profunctor is viewed as a functor to Set. Modeling this operation as a substitution considerably simplifies reasoning using profunctors. Finally we have the action of substitution on elements/natural transformations. First, we can substitute elements/natural transformations for the set variables in elements, denoting the composition of natural transformations. Second, an element is also parameterized by a contravariant and a covariant category variable α; β. We can think of natural transformations as *polymorphic* in the categories involved, and so when we make a transformation substitution, we also *instantiate* the polymorphic category variables with objects. The full syntactic details of substitution are included in the appendix.

### **2.1 Category Connectives**

In this section we discuss some connectives for constructing categories, which are specified by introduction and elimination rules in Figure 3 (the βη equality and substitution rules are included in the appendix). The introduction and elimination rules make use of functors, profunctors, and natural transformations. First we introduce the additive connectives: the unit category 1 and product category <sup>×</sup> have the usual introduction and elimination rules defining functors to/from them. Next, we introduce the *graph of a profunctor* - <sup>α</sup>;<sup>β</sup> P. Just as a relation R : A × B → Set can be viewed as a subset {(a, b) ∈ A × B|R(a, b)}, any profunctor P : <sup>o</sup> <sup>−</sup> <sup>×</sup> -<sup>+</sup> → Set can be viewed as a category with a functor to <sup>−</sup> <sup>×</sup> -<sup>+</sup> (no op), specifically a two-sided discrete fibration. In set-based category theory, the objects of - <sup>α</sup>;<sup>β</sup> P are triples (a−, a+, s : P(a−, a+)) and morphisms from (a−, a+, s) to (a- −, a- +, s- ) are pairs of morphisms f<sup>−</sup> : a<sup>−</sup> → a- <sup>−</sup> and f<sup>+</sup> : a<sup>+</sup> → a- <sup>+</sup> such that P(id, f+)(s) = P(f−, id)(s- ). With various choices of P, this connective can be used to define the arrow category, slice category, comma category and category of elements. In our syntax we define it as the universal category equipped with functors to <sup>−</sup> and <sup>+</sup> and a natural transformation to P.

Lastly, we define the *negative* and *positive* presheaf categories <sup>P</sup><sup>−</sup> and P<sup>+</sup>-. These are given a syntax suggestive of the fact that they generalize the notion of a powerset, and so can be thought of as "power categories". Note that we include a restriction that the input category is *small*, which is an inductively defined by saying all base categories are small, the unit is small, product of small categories is small and the graph of a profunctor over small categories is small. Notably, the presheaf categories themselves are not small. The negative presheaf category is defined by its universal property that a functor into it - → P<sup>−</sup> is equivalent to a profunctor <sup>o</sup> <sup>×</sup> - → Set. The introduction rule constructs an object of the negative presheaf category from such a profunctor and the elimination rule inverts it. We use the notation p ∈ a for the elements of the induced profunctor. Since a occurs in a negative position, it must depend only on the contravariant variable d−Ξ and vice-versa for p. The positive presheaf category is then the dual. In ordinary set-theoretic category theory the negative presheaf category is the usual presheaf category Set<sup>o</sup> , and the positive presheaf category is the opposite of the dual presheaf category (Set)<sup>o</sup>.

#### **2.2 Set Connectives**

Next, in Figure 4, we cover the connectives for the sets/profunctors, which classify elements/natural transformations (the β/η-rules are in the appendix). First, the unit set a → b is our syntax for the profunctor of morphisms in instantiated at generalized objects a and b. Its introduction and elimination rules are analogous to the usual rules for equality in intensional Martin-L¨of type theory. The introduction rule is the identity morphism (reflexivity) and the elimination rule is an induction principle: we can use a term of s : a → b by specifying the behavior when s is of the form id<sup>α</sup> in the form of a continuation α.t. Like



the J elimination rule for equality in Martin-L¨of type theory, P must be "fully general", i.e. well-typed for variables α and β. This is because for distinct variables <sup>α</sup> and <sup>β</sup>, <sup>α</sup> <sup>→</sup> <sup>β</sup> denotes the unit in a virtual double category, which has a universal property, but <sup>a</sup> <sup>→</sup> <sup>b</sup> denotes a restriction of the unit, which in general does not. Those familiar with linear logic as in e.g. [41] might expect a more general rule, where the continuation t is allowed to use variables that are not used in <sup>s</sup>, i.e., have a context <sup>Φ</sup>l - Φr and the conclusion of the rule to have a context <sup>Φ</sup>l - <sup>Φ</sup> - Φr. Because of dependency, this is not necessarily well-formed in cases where the endpoints <sup>a</sup> and <sup>b</sup> of <sup>a</sup> <sup>→</sup> <sup>b</sup> are not distinct variables. However, the instances of this more general rule that do type check are derivable from our more restricted rule using right/left-hom types.

The tensor product of sets is a kind of combined existential quantifier and monoidal product, which we combine into a single notation P ∃β <sup>Q</sup>, where <sup>β</sup> is the covariant variable of P and the contravariant variable of Q. Then the covariant variable of the tensor product is the covariant variable of Q and the contravariant variable similarly comes from P. In ordinary category theory, this is the composition of profunctors, and is defined by a coend of a product. We require that the variable β quantifies over a small category , as in general this composite doesn't exist for large categories. The introduction and elimination are like those for a combined tensor product and existential type: the introduction rule is a pair of terms, with an appropriate instantiation of β, and the elimination rule says to use a term of a tensor product, it is sufficient to specify the behavior on two elements typed with an arbitrary middle object β.

Next, we introduce the contravariant (P <sup>∀</sup>α- R) and covariant (R ∀<sup>α</sup> P) homs of sets, which are different from each other because we are in an ordered logic. These are a kind of universally quantified function type, where the universally quantified variable must occur with the same variance in domain and codomain. In the contravariant case, it occurs as the contravariant variable in both, and vice-versa for the covariant case. To highlight this, the notation for the contravariant dependence puts the quantified variable on the *left* of the triangle, as contravariant variables occur to the left of the covariant variable, and similarly the covariant hom has the quantified variable on the right. Similar to ordered lambda calculus, the covariant hom is right-associative while the contravariant hom is left-associative. Then the covariant variable of the contravariant hom set is the covariant variable of the codomain and, and the contravariant variable of the hom set is the *covariant* variable of the domain, as the two contravariances cancel. The covariant hom is dual. Semantically, in ordinary category theory these are known as the *hom* of profunctors and are adjoint to the composition of profunctors [7]. The two connectives have similar introduction and elimination rules in the form of λ terms abstracting over both the object of the category and the element of the set, and appropriate application forms. To keep with our invariant that the variable occurrences occur left to right in the term syntax in a manner matching the context, we write the covariant application in the usual order s <sup>a</sup> t where the function is on the left and the argument is on the right, and the contravariant application in the flipped order. We also write the instantiating object as a superscript to de-emphasize it, as in practice it can often be inferred.

Finally, we have the cartesian unit and product sets, which are analogous to the normal unit and product of types. The most notable point to emphasize is that in the formation rule for the product, the two subformulae should have the same covariant and contravariant dependence (as with linear logic, some constructions can syntactically use a variable more than once and still be "linear").

## **2.3 Type Connectives**

Finally, we briefly describe the connectives for the "meta-logic", which extends Martin-L¨of type theory with Π/Σ and extensional identity types (with their standard rules) (Fig. 5). We use extensional identity types so that the description of models is simpler, but intensional identity types could be used instead. The types we include are *universes* for the object categorical logic: types of small categories and locally small categories, functors, profunctors and natural transformations. The rule for the types of small categories and (large) categories are very similar: any definable category defines an element of type Cat, and any element of that type can be reflected back into a category. The only difference for SmallCat is that the categories involved additionally satisfy Small. Again we elide the βη principles, which state that -− and − are mutually inverse. Since every small category Small is a category Cat, there is a definable inclusion function from SmallCat to Cat and the βη properties ensure that this is a monomorphism.

Unit/morphism set:


**Fig. 4.** Set Connectives

Next, we have the types of all functors and profunctors between any two fixed categories. The introduction and elimination forms are those for unary and binary function types respectively, where metalanguage terms of type Fun can be used to construct an object/functor, while metalanguage terms of type Prof -can be used to construct a set/profunctor.

Finally we include a type <sup>∀</sup><sup>α</sup> : -.P which we call the set of "natural elements" of P. The name comes from the case that P is of the form F(α) → G(α) in which case the type <sup>∀</sup><sup>α</sup> : -.F(α) → G(α) can be interpreted as the set of all natural transformations from F to G. More generally this is modeled as an end, and we notate it with a universal quantifier (just as we do for the quantifiers in left/right hom types). Syntactically, ∀α.P is a meta-language type that represents elements/natural transformations with exactly one free variable.

## **3 Formal Category Theory in VETT**

To demonstrate what formal category theory in VETT looks like, we demonstrate some basic definitions and theorems. While it is well known that much category theory can be formalized in virtual equipments, we show these examples to demonstrate how the VETT syntax gives a more familiar syntax to these constructions, while still avoiding the need for explicit naturality and functorial-


**Fig. 5.** Type Connectives

ity side conditions. We have mechanized some of the results in this section (e.g. Lemma 2 and Lemma 3 and the maps in Lemma 4) in Agda.<sup>3</sup>

First, we using the elimination for the unit set, we can see that all constructions are (pro-)functorial:

**Construction 1** *For any small category , we can construct natural elements*


Identity and Composition generalize the reflexivity and transitivity properties of equality, respectively, with the lack of symmetry being a key feature of the generalization. In addition, we can prove that the (pro)-functoriality axioms commute with the composition proof by the η principle for the unit. (Pro-)Functoriality generalizes the statement that all functions and relations respect equality. Naturality is more complex to state, and it is a statement about the *proofs* so it has no analog in ordinary higher-order logic. The following version is stated for any *profunctor*, with the usual case of naturality arising when Rαβ <sup>=</sup> F α <sup>→</sup>-Gβ.

**Lemma 1 (Naturality).** *For any* <sup>t</sup> : <sup>∀</sup><sup>α</sup> : -.R(α; <sup>α</sup>)*, by composing with profunctoriality, we can construct terms* <sup>α</sup><sup>1</sup> : -, f : α<sup>1</sup> → α2, α<sup>2</sup> : - *lcomp*(f, t<sup>α</sup><sup>2</sup> ) *and rcomp*(<sup>t</sup> <sup>α</sup><sup>1</sup> , f) : <sup>R</sup>(α1; <sup>α</sup>2) *that are both equal to ind*<sup>→</sup>(f, t)*.*

Next, we turn to some of the central theorems of category theory, the Yoneda and Co-Yoneda lemmas. Despite being ultimately quite elementary, these are notoriously abstract. In VETT, we view these as ordered generalizations of some very simple tautologies about equality. For instance, the Yoneda lemma generalizes the equivalence between the formulae ∀y.x = y ⇒ P y and P x for any x.

<sup>3</sup> https://github.com/maxsnew/virtual-equipments/blob/master/agda/ Examples.agda

**Lemma 2.** *Let* <sup>α</sup> : *and* <sup>π</sup> : <sup>P</sup><sup>+</sup>*. Then*


The proofs both follow from the unit elimination rule, which is essentially the Yoneda lemma—the two cases of showing (1) is an isomorphism are precisely the β and η rules for the unit.

Next, we have the "Fubini" theorems, which relate the tensor and hom types. The statement and proofs for these theorems are analogous to proofs relating tensor and hom in ordered logic. For instance, the second isomorphism below is analogous to the equivalence (P Q) - R ∼= P - Q -R in ordered logic.

**Lemma 3 (Fubini).** *The following isomorphisms hold when the corresponding profunctors are well typed.*

*1.* P(α; β) ∃β (Q(β; γ) ∃γ R(γ; δ)) ∼= (P(α; β) ∃β Q(β; γ)) ∃γ R(γ; δ) *2.* (P(δ; β) ∃β <sup>Q</sup>(β; <sup>γ</sup>)) ∀<sup>γ</sup> <sup>S</sup>(α; <sup>γ</sup>) <sup>∼</sup><sup>=</sup> <sup>P</sup>(δ; <sup>β</sup>) ∀<sup>β</sup> <sup>Q</sup>(β; <sup>γ</sup>) ∀<sup>γ</sup> <sup>S</sup>(α; <sup>γ</sup>) *3.* S(γ; δ) <sup>∀</sup><sup>γ</sup> (P(γ; β) ∃β <sup>Q</sup>(β; <sup>α</sup>)) <sup>∼</sup><sup>=</sup> <sup>S</sup>(γ; <sup>δ</sup>) <sup>∀</sup><sup>γ</sup> P(γ; <sup>β</sup>) <sup>∀</sup><sup>β</sup> Q(β; <sup>α</sup>) *4.* Q(δ; γ) ∀<sup>γ</sup>(S(β; γ) <sup>∀</sup><sup>β</sup> P(β; α)) ∼= (Q(δ; γ) ∀<sup>γ</sup> S(β; γ)) <sup>∀</sup><sup>β</sup> P(β; α) *5.* <sup>∀</sup>α.P(α; <sup>β</sup>) ∀<sup>β</sup> <sup>Q</sup>(α; <sup>β</sup>) <sup>∼</sup><sup>=</sup> <sup>∀</sup>β.Q(α; <sup>β</sup>) <sup>∀</sup><sup>α</sup> P(α; <sup>β</sup>)

*Proof.* We show one case as an example, the forward direction of (1) is given by λα.λ(x, δ).ind(p, β, y.ind(q, γ, r.((p, β, q), γ,r); y); x)

Next, we can prove that two definitions of an adjunction are equivalent:

**Lemma 4.** *For* <sup>R</sup> : *Fun and* L : *Fun* -*, the following are in bijection:*


*Proof.* Given the forward homomorphism lr, we can construct η = λα.lr<sup>α</sup> Lα idα. Given the unit we can reconstruct the forward homomorphism using comp (composition) and fctor (functoriality) from Construction 1 as comp<sup>α</sup> <sup>R</sup>(Lα) η<sup>α</sup> Rβ(fctor(R)Lα <sup>β</sup> f).

We can define weighted limits, which as special cases include ordinary limits and Kan extensions.

**Definition 1.** *For a functor* <sup>D</sup> : *Fun and a profunctor* W : *Prof , the limit of* D *weighted by* W *is (if it exists) a functor lim*<sup>W</sup> D : *Fun with an isomorphism* α →-(*lim*<sup>W</sup> <sup>D</sup> )<sup>k</sup> <sup>∼</sup><sup>=</sup> W kj ∀<sup>j</sup> (<sup>α</sup> <sup>→</sup>-Dj)

This generalizes the usual definition that a morphism into a limit is a cone over the diagram (α →- Dj) to be parameterized by a weight W kj. Then we can prove the well-known theorem that right adjoints preserve (weighted) limits:

**Theorem 1.** *If lim*<sup>W</sup> D *exists and is a limit and* R : *Fun has a left adjoint* L*, then* λκ.R((*lim*<sup>W</sup> D )κ) *is the limit of* λj.R(Dj) *weighted by* W*.*

*Proof.*

$$\gamma \to R((\lim^W D)\kappa) \cong L\gamma \to (\lim^W D)\kappa \cong Wkj\,\upwp^{\forall j}\,\ L\gamma \to Dj \cong Wkj\,\upwp^{\forall j}\,\gamma \to R(Dj)$$

This is a high level proof in terms of isomorphisms that may be written in VETT. The first two steps are the instantiation of assumptions (adjointness, weighted limits). The last step uses the fact that a natural isomorphisms lift to natural isomorphism of homs of profunctors. The construction of this isomorphism illustrates how naturality need not be proved explicitly in VETT. For any φ : ∀α.R- αβ ∀<sup>β</sup> Rαβ and <sup>ψ</sup> : <sup>∀</sup>γ.Sγβ ∀<sup>β</sup> <sup>S</sup>- γβ we can construct a natural transformation φψ : <sup>∀</sup>γ.(Rαβ ∀<sup>β</sup> Sγβ) ∀<sup>α</sup> <sup>R</sup>- αβ ∀<sup>β</sup> S- γβ as λγ.λ(f,α).λ(r, β).ψ<sup>γ</sup> <sup>β</sup>(f <sup>β</sup>(φ<sup>α</sup> <sup>β</sup> r)). Furthermore if φ and ψ have inverses,

then φ−<sup>1</sup> ψ−<sup>1</sup> is the inverse of φψ.

## **4 Semantics**

Next, we develop the basics of the model theory for VETT. First, we define a sound and complete notion of categorical model based on hyperdoctrines of virtual equipments. Then we instantiate this general notion of model to show that the VETT can be interpreted in ordinary category theory as well as enriched, internal and indexed notions.

First, we can model the judgmental structure of the unary type theory and predicate logic in *virtual double categories* that are *split fibrant* and have a notion of *small object* [32,17]. We briefly recount the structure present in a virtual double category, but see [17] for a precise definition of the composition rules for 2-cells and functor of virtual double categories.

**Definition 2.** *A virtual double category* V *consists of*


$$\begin{array}{c} C\_0 \xrightarrow{R\_0} \cdots \xrightarrow{R\_n} \xrightarrow{R\_n} C\_n\\ f \downarrow \\ D\_0 \xrightarrow[]{} \xrightarrow[]{} \xrightarrow{} \xrightarrow{} D\_1 \end{array}$$

*We say that the 2-cell* φ *has* S *as codomain, the sequence* R<sup>0</sup> ...R<sup>n</sup> *as domain and call* f *and* g *the left and right "frames", or that* φ *is framed by* f *and* g*.* *We say a virtual double category is* split fibrant *when it has a choice of* restrictions*, that is, for any horizontal arrow* R : C - D *and vertical arrows* f : C- <sup>→</sup> <sup>C</sup> *and* <sup>g</sup> : <sup>D</sup>- <sup>→</sup> <sup>D</sup> *there is a chosen horizontal arrow* <sup>R</sup>(f,g) : <sup>C</sup>- - D- *with a cartesian 2-cell to* R *framed by* f,g *and these chosen cartesian lifts are functorial in* f,g *([46]). A* choice of small objects *is a subset of the objects* <sup>V</sup><sup>s</sup> <sup>⊆</sup> <sup>V</sup>o*. A* morphism *of split fibrant virtual double categories with small objects is a functor of the virtual double categories that additionally preserves the restrictions and smallness of objects. This defines a category fVDCs.*

In the presence of restrictions, every 2-cell can be represented as a "globular" 2-cell where the left and right frame are identities [46]. For example the 2-cell φ above can be represented as one with the same domain but whose codomain is S(f,g). This property is crucial for the completeness of our semantics as we only include a syntax for these globular terms (proof of Construction 2). Each component of this definition has a direct correspondence to a syntactic structure in VETT. The objects of V<sup>o</sup> models the category judgment and the morphisms model the functor judgment. The set V<sup>h</sup> models the profunctor judgment. A composable string <sup>R</sup><sup>0</sup> ··· <sup>R</sup><sup>n</sup> models the profunctor contexts. The 2-cells correspond to the natural transformation judgment where we have taken the restriction S(F, G) of the codomain. Note that Cruttwell and Shulman define a *virtual equipment* to be a virtual double category with all restrictions and all units. The units are the model of the unit of profunctors connective and so all of our models with the unit will be virtual equipments, hence the name VETT.

To model the dependent type theory and indexing of category-theoretic judgments by a Γ with an action of substitution, we use a variation on Lawvere's notion of *hyperdoctrine* for modeling predicate logic[31] 4:

**Definition 3 (VETT Judgmental model).** *A VETT judgmental model (VM*<sup>J</sup> *) is a pair of a category with families* <sup>C</sup> *and a functor* <sup>V</sup> (−) : <sup>C</sup><sup>o</sup> <sup>→</sup> *fVDCs.*

Categories with families C model dependent type theory [22] and for each semantic context Γ, V <sup>Γ</sup> models the VETT judgments in context Γ, with the functoriality modeling the fact that all of these judgments admit a well-behaved action of substitution. A VM<sup>J</sup> is then precisely the structure corresponding to the judgments and actions of substitution in VETT.

**Construction 2 (Syntactic Model)** *The syntax of VETT with with any subset of connectives are included presents a VM*<sup>J</sup> *.*

*Proof.* Define the category of families using the dependent type structure and the virtual equipment structure having (α-equivalence classes of) syntactic categories as objects, functors/sets as vertical/horizontal arrows and interpreting compositions/restrictions as substitutions. The biggest gap between syntax and semantics is in the definition of the 2-cells. A 2-cell from

(α<sup>1</sup> : 1;α<sup>2</sup> : <sup>2</sup> <sup>R</sup>1),(α<sup>2</sup> : 2; <sup>α</sup><sup>3</sup> : <sup>3</sup> <sup>R</sup>2),... to (β<sup>1</sup> : 1; <sup>β</sup><sup>2</sup> : <sup>2</sup> <sup>S</sup>)

<sup>4</sup> note that unlike in hyperdoctrines, we do not require quantifiers adjoint to substitution

### 128 M. S. New and D. R. Licata

with frames <sup>α</sup><sup>1</sup> : <sup>1</sup> b<sup>1</sup> : <sup>1</sup> and α<sup>n</sup> : n b<sup>2</sup> : <sup>2</sup> is given by a term <sup>x</sup><sup>1</sup> : <sup>R</sup>1, x<sup>2</sup> : <sup>R</sup><sup>2</sup> ... s : S[b1/β1; b2/β2]. Composition is defined by substitution.

Then the connectives of VETT each precisely correspond to a universal construction in a VM<sup>J</sup> . The Π, Σ,Id types correspond to their standard semantics in a CwF and the connectives for categories and profunctors correspond to universal constructions in the virtual double categories. Products of categories are interpreted as products in the vertical category, and products of sets as products in the category of pro-arrows and 2-cells. The units, tensor and covariant and contravariant homs are modeled by the universal properties of the same names, as described in [46]. The graph of a profunctor is modeled by tabulators [25]. Finally, the covariant and contravariant presheaf categories can be described as a weakening of the definition of a Yoneda equipment from [19] to virtual double categories. More detailed descriptions of these universal properties are included in the extended version [37]. Then the soundness and completeness of this notion of categorical model is formalized by the following initiality theorem.

**Theorem 2 (Initiality).** The syntax of VETT with any subset of connectives that includes the hom types presents a VM<sup>J</sup> that is initial in the category of VM<sup>J</sup> with the chosen instances of the universal properties and functors that preserve such chosen instances.

Proof. The construction 2 can be extended for any connective modularly, with the exception that the unit relies on the presence of hom sets in order to satisfy the "distributivity" requirement that its elimination can occur in any context. Then we can construct the unique morphism to any HVE induction on syntax.

Now that we have a category-theoretic notion of model, we give some model construction theorems that can be used to justify our intuitive notion of semantics in (enriched, internal, indexed) category theory. First, we can extend any set-theoretic model of the category theoretic judgments to a hyperdoctrine of models where the category of families is the category of sets:

**Construction 3** Given a V ∈ fVDCs, we can construct a VM<sup>J</sup> V<sup>−</sup> : Set → vDbl<sup>r</sup> by defining of (V<sup>Γ</sup> )<sup>o</sup> to be functions <sup>V</sup><sup>Γ</sup> <sup>o</sup> , and similarly for morphisms and 2-cells with all operations given pointwise.

Then to define a model of VETT with a collection of connectives it is sufficient to construct a virtual equipment with the corresponding universal properties. The "standard model" is the virtual double category of locally small categories where the small objects are the small categories.

**Construction 4** Fix a cardinal κ. The virtual double category Cat<sup>κ</sup> is defined to have as objects locally κ-small categories, small objects as κ-small categories, vertical morphisms as functors, horizontal arrows as functors <sup>o</sup> <sup>×</sup> <sup>→</sup> κSet and 2-cells as morphisms of profunctors. Restriction of profunctors is given by composition, which is strictly associative and unital. Cat<sup>U</sup> has objects satisfying the universal properties of all connectives in VETT.

More generally, categories internal to, enriched in and/or indexed by sufficiently nice categories define a virtual equipment that model the connectives of VETT. We highlight one example from the literature that is highly general: Shulman's enriched indexed categories [47]. Shulman's construction defines a virtual double category of large and small V-categories for any pseudofunctor <sup>V</sup> : <sup>S</sup><sup>o</sup> <sup>→</sup> MonCat where <sup>S</sup> is a category with finite products. He gives examples that show that this subsumes ordinary internal, enriched and indexed categories for suitable choices of V, as well as more general categories that can be thought of as both indexed and enriched. This is slightly weaker then what we require: to have *split* restrictions, we need that V be a *strict* functor, not merely a pseudo-functor. This is analogous to the situation for dependent type theory, where syntactic substitution is strictly associative, but semantic substitution is typically given by pullback, which is only associative up to unique isomorphism. Shulman's construction carries over when the functor is strict but some of their example instances would require a strictification theorem.

**Construction 5 (Shulman [47])** *Given any functor* <sup>V</sup> : <sup>S</sup><sup>o</sup> <sup>→</sup> *SymMonCat such that* S *and* V *have sufficiently well-behaved (indexed)* κ*-products, then there is a virtual equipment* V − *Cat whose objects are locally* κ*-small* V*-categories, small objects are* κ*-small* V*-categories etc. This virtual equipment has objects satisfying all of the universal properties needed for a model of VETT.*

A final model that uses a CwF that is not Set would be given by taking extensional dependent type theory as the CwF and interpreting the categorytheoretic constructions by their definitions inside type theory.

# **5 Related and Future Work**

We now compare VETT with other calculi for formal category theory.

C´accamo and Winskel [12] develop a formal language for defining categories, functors (of many variables) and proving existence of natural equivalences between them. Their system can encode profunctors as functors into Set. Their natural equivalence judgment does not have proof terms or equality between equivalences and they do not support natural transformations. Additionally, they only consider ordinary categories as the intended model and do not develop a more general semantics. Riehl and Verity [43] use a formal language of virtual equipments to prove results valid for ∞-categories without concrete manipulation of model categories. They formalize this language as a theory in Makkai's framework of first-order logic with dependent sorts (FOLDS). While this previous work has the same models as VETT, we believe that the syntax we propose in this paper formalizes informal arguments more directly, as shown in Section 3. This is because FOLDS approach approach is entirely relational, whereas we formalize concepts like restriction of a profunctor or composition of natural transformations as functional operations (substitution). In particular, this means that our calculus requires only vertically degenerate squares (elements/natural transformations) as a "user-facing" notion, with general squares occurring only in the admissible substitution operations.

The coend calculus [33] is an informal syntax for manipulating profunctors involving ends and coends; an extension of VETT to treat profunctors of many variables of different variances may provide a formal treatment of it.

Myers [35] provides a string diagram calculus for double categories and proarrow equipments, generalizing string diagrams for monoidal categories. These are an alternative approach to type-theoretic calculi, with the string diagrams typically making tensor products simpler to work with, while a type-theoretic calculus like VETT makes the closed structure P -<sup>∀</sup><sup>α</sup> Q simpler to work with by using bound variables.

Cartesian bicategories are similar to equipments but they axiomatize the bicategory of profunctors rather than the full double category of functors and profunctors [13]. Frey [23] describes preliminary work on a proof system for Cartesian bicateogires. Their profunctors are more general than in VETT in as they may have 0, 1 or more covariant or contravariant variables. But they do not have a term syntax for functors or natural transformations.

Our work in this paper fits broadly into a line of work on directed dependent type theories, a type theory where the identity type is interpreted as morphisms in a (possibly ∞-)category. In directed type theories based on a bisimplicial model [42,11,55,54], morphism types are defined using an interval object, like in cubical type theory [8,16,4,3], and universal properties like "morphism induction" are an internally definable property of certain types. Other type theories [38,1] define morphism types via an induction principle, corresponding to the lifting properties of certain kinds of fibrations of categories. While these previous works can express some constructions on Cat that are not expressible in VETT, because VETT is more restricted, VETT contrariwise has more models, for instance categories enriched in non-cartesian monoidal categories, so the theorems that are provable in VETT apply in more settings.

Finally, some variations on double categories have been used to model the structure of certain program logics. GTT [36] is a logic for vertically thin proarrow equipments, where there is at most one vertical arrow or 2-cell of any tyepe, so their calculus does not include functor or transformation judgments. Another similar calculus is System P [21] which is an internal language of reflexive graph categories, which are like double categories without horizontal composition.

In future work, VETT could incorporate functor categories by generalizing the unary type theory of functors to functors of many variables, in which case ordinary λ calculus can be used to define functor categories as function types, and incorporate multi-variable profunctors as in [23]. This would require to the models to have a monoidal structure. Ideas from coeffects and enriched category theory may be useful for defining opposite categories [48,10].

**Acknowledgments.** This material is based on research sponsored by the National Science Foundation under agreement number CCF-1909517 and the United States Air Force Research Laboratory under agreement number FA9550- 21-0009 (Tristan Nguyen, program manager). The authors would like to thank David Jaz Myers, Emily Riehl, Mike Shulman, Dominic Verity for helpful feedback on this work.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Strict Constrained Superposition Calculus for Graphs**

Rachid Echahed, Mnacho Echenim, Mehdi Mhalla, and Nicolas Peltier()

Universit´e Grenoble Alpes, LIG, CNRS, Inria, Grenoble INP, 38000 Grenoble, France nicolas.peltier@imag.fr

**Abstract.** We propose a superposition-based proof procedure to reason on equational first order formulas defined over graphs. First, we introduce the considered graphs that are directed labeled graphs with lists of roots standing for pins or interfaces for replacements. Then the syntax and semantics of the considered logic are defined. The formulas at hand are clause sets built on equations and disequations on graphs. Afterwards, a sound and complete proof procedure is provided, and redundancy criteria are introduced to dismiss useless clauses and improve the efficiency of the procedure. In a first step, a set of inferences rules is provided in the case of uninterpreted labels. In a second step, the proposed rules are lifted to take into account labels defined as terms interpreted in some arbitrary theory. Particular formulas of interest are Horn clauses, for which stronger redundancy criteria can be devised. Essential differences with the usual term superposition calculus are emphasized.

## **1 Introduction**

Graphs are ubiquitous structures in computer science. They are used to model several notions such as data, program runs (transition systems), networks, software and hardware architectures. They are also often used as foundational structures to model knowledge or data bases, cognitive or intelligent systems as well as physical, chemical or biological phenomena. They constitute, in addition, the basis of operational research or combinatorics. Graphs are, definitely, fundamental structures for modelling, computing and reasoning. Graph transformations have been studied since the early 70's [29]. Some of their applications can be found in [16,18]. In the literature, one can distinguish two main streams of approaches for graph transformation, namely the algebraic approaches [15,12] where category theory is used to define structure transformations in a very abstract and elegant way and the algorithmic approaches where graph transformations are defined by means of the actual algorithms involved in the transformations [20,13].

During the last decade, a very interesting application of graph transformations has emerged in the area of quantum models of computation, see e.g., the calculi ZX [11], ZH [3], ZW [24] or PBS [10]. In these calculi, one can specify quantum algorithms using particular graphs and can make some equational reasoning on them to verify correctness of quantum algorithms, see e.g. the

Quantomatic tool [25]. In such situations, making automated equational reasoning over graphs is very desirable even though equational theories over graphs are not recursively enumerable in general (see e.g. [7]).

The *superposition calculus* [1] is one of the most successful automated proof procedures which handles equational theories (on terms) which is being actually implemented in various theorem provers such as Vampire [28], Spass [32], or E [30]. The calculus operates on finite sets of equational clauses. It is defined as a set of *inference rules*, which deduce new clauses from previous ones. To prune the search space, strong restrictions (based on term orderings and literal selection functions) are imposed on the inferences, and redundancy criteria are provided to detect and dismiss useless clauses. The rules are applied until a contradiction (i.e., the empty clause) is derived or until the set is *saturated*, i.e., no further non-redundant clause may be deduced. The calculus is *refutationally complete*, in the sense that it is able to derive a contradiction from any unsatisfiable clause set. In a recent work [14], we proposed a superposition calculus for testing the unsatisfiability of sets of equations and disequations between graphs whose shapes are inspired by those used in the ZX calculus, where nodes are labeled by first-order (uninterpreted) terms. In the present paper we extend this work in several directions: (i) We tackle full clauses, i.e., disjunctions of equations and disequations. This extension turned out to be much more difficult than we initially expected, due to the fact that no reduction order exists on the considered graphs (see Examples 19 and 22), which complicates the completeness proof. We introduce redundancy criteria that cover some usual deletion and simplification rules. (ii) We lift the obtained calculus into a constrained calculus operating on graphs labeled by terms interpreted in some base theory. The procedure is a semi-decision procedure for unsatisfiability if the underlying theory is (semi) decidable and compact. (iii) We consider a slightly different class of graphs, where multi-edges are allowed. The new framework has the advantage of being both more general and simpler, and it also improves the efficiency of the calculus (more precisely for the computation of "merges" between graphs, see Remark 9).

**Why defining a graph superposition calculus is difficult.** We wish to emphasize some important differences between term and graph superposition. (i) It is well-known that term rewrite systems that are terminating and in which all critical pairs are joinable are confluent. This property plays a key rˆole in the completeness proof of the superposition calculus. However, such a property does *not* hold for graph rewrite systems, and, worse, confluence is undecidable for terminating graph rewrite rules (if confluence is meant modulo isomorphism). As it is done in [14] we overcome this issue by considering a special class of graphs, for which the above property holds. This class is obtained by restricting the way graphs can be composed and replaced, using a sequence of distinguished nodes in the graphs, called roots. (ii) The usual superposition calculus is based on the use of a *reduction order*, i.e., a well-founded order on terms that is total on ground terms and closed under instantiation and embedding. Unfortunately no such order exists for graphs in general (see Example 19). Thus the model construction algorithm used to establish refutational completeness must cope with non terminating systems (indeed, since a ground equation g ≈ h cannot always be oriented, one must consider both rules: g → h and h → g, which entails that the system does not terminate). Confluence is harder to establish for non terminating systems and we need to devise a new confluence criterion. (iii) The usual redundancy criterion of [1] (where a clause is considered redundant if it is implied by smaller clauses) does not apply to graphs. For instance the conclusion of an inference may be strictly bigger than all the premises (see Example 21). This is due to the fact that two graphs may overlap without one of them being included in the other. Such a behavior cannot be avoided, since, as proven in [14, Theorem 45], satisfiability is undecidable for sets of ground equational clauses defined on graphs (whereas it is well known to be decidable for standard ground clauses based on terms), thus superposition cannot terminate on ground graphs. Furthermore, we show (see Example 22) that the calculus is – rather surprisingly – not compatible with tautology deletion in general (tautology deletion is possible for Horn clauses).

**Related work.** The graphs we are considering are intended to capture (possibly cyclic) circuit shaped structures such as those used in the ZX or related calculi. They are close to hypergraphs with interfaces as used in some papers (see, e.g. [5]) where the roots or interfaces are used in the gluing process while transforming a graph. We follow an algorithmic approach when transforming the graphs. This approach eases the completeness proofs of the proposed superposition calculus. However, the performed graph transformations used in the present paper can be encoded as simple double pushout (DPO) [19] steps of the form L ←− Roots −→ R with some additional constraints on matched subgraphs. It is also a particular case of DPOI steps (DPO with interfaces) where the roots play the rˆole of the interfaces [5]. Automated reasoning in presence of graph structures is not an easy task in general. Several authors did tackle this problem and one can distinguish different approaches in the literature. Variants of Hoare-like calculi have been proposed for the verification of graph transformation systems see, e.g., [23,26,6,8]. Likewise, model checking procedures have also been devised in presence of graph structures see, e.g. [27,31]. In these works, a dynamic logic underlying program execution is assumed. In addition, a dedicated logic is used to express graph properties to be proven. Other techniques have been used to prove graph equivalences such as bisimulation [17] or normalization using terminating and confuent graph rewriting systems [9]. In the paper at hand, we are rather concerned by a refutational proof technique based on superposition dedicated to a class of graphs. Thus our proof procedure departs from all the aforementioned works. To our knowledge, only the report [22] presents a refutational procedure dedicated to ZX diagrams which is close to ours. However, the authors use the classical superposition calculus [1] over first-order terms and provide a translation from the considered graphs to first-order terms. Such translation needs the use of additional axioms encoding some graph properties such as associativity and commutativity of graph constructor operations. Such additional axioms are useless in our framework. The class of graph rewriting systems handled in our proof procedure are not necessarily terminating and thus we had to devise new criteria to ensure their (ground) confluence instead of using joinability of pre-critical pairs as done in [4].

The paper is organized as follows. Section 2 introduces some basic notations and defines the considered graphs and the operations used over them. In Section 3 the syntax and semantics of the formulas are introduced. In Section 4, a first set of inference rules is defined to test the satisfiability of sets of clauses where graphs are endowed with uninterpreted labels and its completeness is established modulo a redundancy criterion that captures usual deletion or simplification rules (such as subsumption). In Section 5 the obtained calculus is lifted to graphs labeled with terms that can be interpreted in some arbitrary theory and possibly containing variables. Completeness is guaranteed if the theory is semi-decidable and compact. This last calculus is proven complete and an enhanced redundancy test is proposed. Concluding remarks are given in Section 6. Due to lack of space, proofs are omitted.

## **2 Graphs and Graph Operations**

We briefly review some usual definitions and notations. For any partial function f, we denote by *dom*(f) the domain of f. If f and g are partial functions, we write f(x) = g(x) to state that either x -<sup>∈</sup> *dom*(f) <sup>∪</sup> *dom*(g) or that x <sup>∈</sup> *dom*(f)∩*dom*(g) and the images of x by f and g are identical. Given a multiset <sup>m</sup> and an element <sup>e</sup>, <sup>m</sup>(e) denotes the multiplicity of <sup>e</sup> in <sup>m</sup>. For all multisets <sup>m</sup><sup>1</sup> and m2, we denote by m1+m<sup>2</sup> and m1−m<sup>2</sup> the sum and difference of m<sup>1</sup> and m2, respectively. We write m<sup>1</sup> m<sup>2</sup> to state that m<sup>1</sup> is included in m2. A multiset containing exactly the elements <sup>e</sup><sup>1</sup>,...,en is written *{*e<sup>1</sup>,...,en*}*. We denote by m1m<sup>2</sup> the union of m<sup>1</sup> and m<sup>2</sup> (i.e., the minimal multiset containing m<sup>1</sup> and m2) defined as follows: for all elements <sup>e</sup>, (m<sup>1</sup> <sup>m</sup>2)(e) = max(m1(e), <sup>m</sup>2(e)). Finite sequences may sometimes be identified with sets if the order is not important, e.g., if *<sup>y</sup>* = (y<sup>1</sup>,...,yn), we may write <sup>x</sup> <sup>∈</sup> *<sup>y</sup>* to state that <sup>x</sup> <sup>=</sup> <sup>y</sup>i, for some i = 1,...,n. We recall that a preorder is a binary relation that is reflexive and transitive. Any preorder <sup>≤</sup> may be associated with a strict order < defined as follows: x<y ⇐⇒ (x <sup>≤</sup> y <sup>∧</sup> y -<sup>≤</sup> x).

The graphs we consider are directed, labeled graphs enriched with a sequence of distinguished nodes, called *roots*:

**Definition 1.** *Let* <sup>N</sup> *be a countably infinite set of* nodes *and let* <sup>L</sup> *be a set of* labels*, disjoint from* <sup>N</sup> *. An* <sup>L</sup>-graph <sup>g</sup> *is a tuple* N, E, R, L *, where:*


*The components* N*,* E*,* R *and* L *of a graph* g *are denoted by* Ng*,* Eg*,* R<sup>g</sup> *and* <sup>L</sup>g*, respectively. We denote by* <sup>N</sup><sup>g</sup> *the set of nodes* <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup> *that do not occur in* Rg*. The* profile *of a graph* g*, written pr* (g)*, is the length of* Rg*.*

*Example 2.* The <sup>L</sup>-graph <sup>g</sup> with <sup>N</sup><sup>g</sup> <sup>=</sup> {ρ1, α, β}, <sup>E</sup><sup>g</sup> <sup>=</sup> *{*(ρ1, α),(ρ1, β),(α, β)*}*, <sup>R</sup><sup>g</sup> = (ρ1), *dom*(Lg) = {α, β}, <sup>L</sup>g(α) = 0 and <sup>L</sup>g(β) = 1 is depicted graphically as follows:

We write α : to state that a node named α is labeled by . In many cases, the names of the non-root nodes will be irrelevant, and will thus be omitted. When possible, root nodes will be named ρ1, ρ2, ρ3,. . . in this order.

In the following, L-graphs will be considered up to a renaming of nodes. More precisely, the isomorphism relation on L-graphs is defined as follows.

**Definition 3.** *An* <sup>N</sup> -renaming <sup>μ</sup> *is an injective mapping from* <sup>N</sup> *to* <sup>N</sup> *. It is extended to any* <sup>L</sup>*-graph* <sup>g</sup> *by replacing every occurrence of a node* <sup>α</sup> *by* <sup>μ</sup>(α)*. In particular, the function* L<sup>μ</sup>(g) *is defined as follows:* L<sup>μ</sup>(g)(α) = *iff* Lg(β) = *for some* <sup>β</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup> *such that* <sup>μ</sup>(β) = <sup>α</sup> *(*L<sup>μ</sup>(g) *is well-defined since* <sup>μ</sup> *is injective). We write* <sup>g</sup> <sup>≡</sup> <sup>h</sup> *if* <sup>h</sup> <sup>=</sup> <sup>μ</sup>(g)*, for some* <sup>N</sup> *-renaming* <sup>μ</sup>*. It is easy to check that* <sup>≡</sup> *is an equivalence relation. Two* <sup>L</sup>*-graphs* <sup>g</sup>, <sup>h</sup> *such that* <sup>g</sup> <sup>≡</sup> <sup>h</sup> *are* isomorphic*.*

## **2.1 Subgraphs and Replacement**

We define the notion of a subgraph. The definition is slightly stronger than the usual one in graph theory because it imposes that only nodes that are roots in the subgraph can be connected to a node outside the subgraph. These roots can be viewed as an "interface" which restricts the way graphs may be connected and composed.

**Definition 4 (Subgraph).** *A graph* <sup>h</sup> *is a* subgraph *of* <sup>g</sup> *(written* <sup>h</sup> <sup>≤</sup><sup>g</sup> <sup>g</sup>*) if* <sup>N</sup><sup>h</sup> <sup>⊆</sup> <sup>N</sup>g*,* <sup>E</sup><sup>h</sup> <sup>E</sup>g*,* <sup>N</sup><sup>h</sup> <sup>⊆</sup> <sup>N</sup><sup>g</sup>*,* <sup>L</sup>h(α) = <sup>L</sup>g(α) *for all* <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>h</sup> *and if a node* α *occurs in an edge in* <sup>E</sup><sup>g</sup> <sup>−</sup> <sup>E</sup><sup>h</sup> *then* <sup>α</sup> ∈ <sup>N</sup>h*.*

*Example 5.* Consider the <sup>L</sup>-graphs <sup>h</sup>, <sup>i</sup>, <sup>j</sup> and <sup>k</sup> with respective roots (α, β), (β), (α) and (ρ1), defined as follows:

The L-graph h is a subgraph of the L-graph g from Example 2, but i, j and k are not. Indeed, α has different labels in g and i; g contains an edge between ρ<sup>1</sup> and β that does not occur in j and β is not a root node in j; and E<sup>g</sup> − E<sup>k</sup> contains the edge (α, β) between nodes that are not roots in k.

The replacement operation is defined in a natural way: all vertices and edges occurring from the replaced subgraph are deleted and replaced by those in the replacing graph (we assume that the considered graphs share the same roots).

**Definition 6 (Subgraph replacement).** *Let* <sup>g</sup> *be an* <sup>L</sup>*-graph and let* <sup>h</sup> *be a subgraph of* <sup>g</sup>*. An* <sup>L</sup>*-graph* <sup>i</sup> *is* substitutable for <sup>h</sup> in <sup>g</sup> *if* <sup>R</sup><sup>i</sup> <sup>=</sup> <sup>R</sup><sup>h</sup> *and* <sup>N</sup><sup>g</sup>∩N<sup>i</sup> <sup>=</sup> <sup>∅</sup>*. If* <sup>i</sup> *is substitutable for* <sup>h</sup> *in* <sup>g</sup>*, then we denote by* <sup>g</sup>{<sup>h</sup> <sup>←</sup> <sup>i</sup>} *(the* <sup>L</sup>*-graph obtained by replacing* <sup>h</sup> *by* <sup>i</sup> *in* <sup>g</sup>*) the tuple* N- , E- , R- , L- *, where:*

$$\begin{array}{lcl} & -N' \stackrel{\text{def}}{=} (N\_{\mathfrak{g}} \backslash N\_{\mathfrak{h}}) \cup N\_{\text{i.}}. \text{ Note that since } R\_{\text{i}} = R\_{\mathfrak{h}} \text{ we have } N' = (N\_{\mathfrak{g}} \backslash \widehat{N}\_{\mathfrak{h}}) \sqcup \widehat{N}\_{\text{i.}}.\\ & -E' \stackrel{\text{def}}{=} (E\_{\mathfrak{g}} - E\_{\mathfrak{h}}) + E\_{\text{i.}}.\\ & -R' \stackrel{\text{def}}{=} \stackrel{\text{def}}{=} R\_{\mathfrak{g}}. \\ & -L'(\alpha) \stackrel{\text{def}}{=} \begin{cases} L\_{\mathfrak{g}}(\alpha) & \text{if } \alpha \in N\_{\mathfrak{g}} \backslash \widehat{N}\_{\text{i}} \\ L\_{\text{i}}(\alpha) & \text{if } \alpha \in \widehat{N}\_{\text{i}} \end{cases} \text{ for all } \alpha \in N' \backslash R'. \end{cases}$$

*Example 7.* Let i be the <sup>L</sup>-graph with root (α, β) defined below. Using the <sup>L</sup>graphs g and h from Examples 2 and 5, we get the following L-graph g{h ← i - } (the edge (α, β) occurs twice because it occurs both in E<sup>i</sup>and in <sup>E</sup><sup>g</sup> <sup>−</sup> <sup>E</sup><sup>h</sup>):

The notation <sup>g</sup>{<sup>h</sup> <sup>←</sup> <sup>i</sup>} is extended to the case where *pr* (i) = *pr* (h) as follows: <sup>g</sup>{<sup>h</sup> <sup>←</sup> <sup>i</sup>} def = g{h ← i - }, where i is any L-graph substitutable for h in g such that i ≡ i - . Thus the replacement operation possibly involves a renaming step, to avoid conflicts on the names of the nodes. The next proposition states a straightforward property of subgraph replacement:

**Proposition 8.** *Let* <sup>g</sup>, <sup>h</sup>, <sup>i</sup>, <sup>j</sup> *be* <sup>L</sup>*-graphs, where* <sup>i</sup> <sup>≤</sup><sup>g</sup> <sup>h</sup> <sup>≤</sup><sup>g</sup> <sup>g</sup> *and pr* (i) = *pr* (j)*. Then* <sup>g</sup>{<sup>h</sup> <sup>←</sup> <sup>h</sup>{<sup>i</sup> <sup>←</sup> <sup>j</sup>}} ≡ <sup>g</sup>{<sup>i</sup> <sup>←</sup> <sup>j</sup>}*.*

*Remark 9.* Note that Proposition 8 would not hold if edges were defined as sets and not as multisets. For instance, consider <sup>L</sup>-graphs <sup>g</sup>, <sup>h</sup> with two root nodes ρ<sup>1</sup>, ρ<sup>2</sup>, where <sup>g</sup> contains an edge (ρ<sup>1</sup>, ρ<sup>2</sup>) and <sup>h</sup> contains no edges. If edges are taken as sets then we get g{h ← g} = g and g{g ← h} = h, whereas g{h ← h} = g. In our previous work [14], this problem was overcome by restricting ourselves to induced subgraphs (which prevents the replacement of h by g in g), but this causes a combinatorial explosion in the definition of the calculus: when one "merges" two subgraphs, it is necessary to add every possible combination of edges connecting a root of the first L-graph to a root of the second one, yielding exponentially many solutions w.r.t. the number of roots (see [14, Definition 30]). Such a behavior is avoided in the new framework.

We now introduce a notion of orthogonality between graphs. The intuition is that two L-graphs will be considered orthogonal if they share no edges and no nodes other than roots.

**Definition 10 (Orthogonal graphs).** *Let* <sup>g</sup> *be an* <sup>L</sup>*-graph. Two subgraphs* <sup>h</sup> *and* <sup>i</sup> *of* <sup>g</sup> *are* orthogonal in <sup>g</sup>*, or simply* orthogonal*, if* <sup>N</sup><sup>h</sup> <sup>∩</sup> <sup>N</sup><sup>i</sup> = ∅ *and* E<sup>h</sup> + E<sup>i</sup> Eg*.*

Note that h and i may share root nodes. Proposition 11 states that the result of the replacement of two orthogonal subgraphs does not depend on the order in which the L-graphs are considered.

**Proposition 11.** *Let* <sup>g</sup> *be an* <sup>L</sup>*-graph, and let* <sup>h</sup>1*,* <sup>h</sup><sup>2</sup> *be orthogonal subgraphs of* g*. For all* L*-graphs* i1, i<sup>2</sup> *of respective profiles pr* (h1) *and pr* (h2)*,* h<sup>2</sup> *and* h<sup>1</sup> *are subgraphs of* g{h<sup>1</sup> ← i1} *and* g{h<sup>2</sup> ← i2}*, respectively, and* g{h<sup>1</sup> ← i1}{h<sup>2</sup> ← i2} ≡ g{h<sup>2</sup> ← i2}{h<sup>1</sup> ← i1}*.*

## **2.2 Graph Merging**

Intuitively, a merge of two L-graphs g<sup>1</sup> and g<sup>2</sup> denotes any minimal L-graph containing all vertices, labels and edges in g<sup>1</sup> and g2. More formally:

**Definition 12.** *<sup>A</sup>* merge *of two* <sup>L</sup>*-graphs* <sup>g</sup><sup>1</sup> *and* <sup>g</sup><sup>2</sup> *is an* <sup>L</sup>*-graph* <sup>h</sup> *such that: (i)* <sup>g</sup><sup>i</sup> <sup>≤</sup><sup>g</sup> <sup>h</sup>*, for all* <sup>i</sup> = 1, <sup>2</sup>*; (ii)* <sup>N</sup><sup>h</sup> <sup>=</sup> <sup>N</sup><sup>g</sup><sup>1</sup> <sup>∪</sup> <sup>N</sup><sup>g</sup><sup>2</sup> *,* <sup>E</sup><sup>h</sup> <sup>=</sup> <sup>E</sup><sup>g</sup><sup>1</sup> <sup>E</sup><sup>g</sup><sup>2</sup> *and* <sup>N</sup><sup>h</sup> = N<sup>g</sup><sup>1</sup> <sup>∪</sup> <sup>N</sup><sup>g</sup><sup>2</sup> *; (iii) for all* <sup>i</sup> = 1, <sup>2</sup> *and for all* <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup><sup>i</sup> *,* Lh(α) = L<sup>g</sup><sup>i</sup> (α)*.*

Note that in contrast to [14, Definition 30], the merge contains no node and edge other than those occurring in g<sup>1</sup> or g2. Moreover, the multiplicity of edges is minimal (E<sup>h</sup> is defined as E<sup>g</sup><sup>1</sup> E<sup>g</sup><sup>2</sup> instead of E<sup>g</sup><sup>1</sup> + E<sup>g</sup><sup>2</sup> ). It is easy to check that a merge of <sup>g</sup>1, <sup>g</sup><sup>2</sup> exists iff <sup>L</sup><sup>g</sup><sup>1</sup> (α) = <sup>L</sup><sup>g</sup><sup>2</sup> (α) holds for all <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup><sup>1</sup> <sup>∩</sup> <sup>N</sup>g2 . Moreover, all the merges are equal up to a permutation of their roots.

*Example 13.* Consider the following L-graphs g and h below of respective roots (ρ1, ρ2) and (ρ2, ρ3), where the nodes α, β, γ are labeled by 0, 1 and 2, respectively. These L-graphs admit the following merge i, of root (ρ1, ρ2, ρ3):

*Example 14.* Let g, h, i and j be the L-graphs, defined as follows:

The L-graph g has roots (α, β) and h, i, j have roots (α). Then g and h admit the following merge, of root (α): <sup>γ</sup> : 1 α <sup>β</sup> : 2 <sup>δ</sup> : 3

142 R. Echahed et al.

In contrast, <sup>g</sup> and <sup>i</sup> admit no merge (since γ has different labels in the two graphs), and neither do g and j (due to the edge connecting the non-root node γ to δ, that is outside of <sup>g</sup>).

**Lemma 15.** *Let* <sup>g</sup> *be an* <sup>L</sup>*-graph and let* <sup>h</sup>, <sup>i</sup> *be subgraphs of* <sup>g</sup>*. Then* <sup>h</sup> *and* <sup>i</sup> *admit a merge* <sup>j</sup>*, and for all merges* <sup>j</sup> *of* <sup>h</sup> *and* <sup>i</sup> *we have* <sup>j</sup> <sup>≤</sup><sup>g</sup> <sup>g</sup>*.*

# **3 An Equational Logic on Graphs**

We now define equational clauses built on L-graphs and their semantics.

**Definition 16.** *An* equation *is an unordered pair written* <sup>g</sup> <sup>≈</sup> <sup>h</sup>*, where* <sup>g</sup>, <sup>h</sup> *are* <sup>L</sup>*-graphs such that* <sup>R</sup><sup>g</sup> <sup>=</sup> <sup>R</sup><sup>h</sup>*. A* literal *is either an equation (positive literal) or the negation of an equation, written* g ≈ h *(negative literal). A* clause *is a disjunction of literals. The disjunction may be empty, in which case the clause is written* -*. A clause is* Horn *if it contains at most one positive literal. A set of clauses is* Horn *if it contains only Horn clauses.*

Note that we assume for technical convenience that the two members of an equation share the same roots. <sup>N</sup> -renamings μ are extended to equations, literals and clauses in a straightforward way: μ(<sup>g</sup> <sup>≈</sup> <sup>h</sup>) def <sup>=</sup> μ(g) <sup>≈</sup> μ(h), μ(<sup>g</sup> ≈ h) def <sup>=</sup> μ(g) ≈ μ(h) and μ(C <sup>∨</sup> D) def <sup>=</sup> μ(C) <sup>∨</sup> μ(D). The relation <sup>≡</sup> is extended accordingly.

Sets of clauses built on L-graphs will be interpreted w.r.t. a congruence on L-graphs. Graph congruences are defined in same way as for terms, except that we also assume that they are closed under isomorphism.

**Definition 17 (Graph Congruence).** *A binary relation on* <sup>L</sup>*-graphs is* closed under isomorphisms *if* <sup>i</sup> <sup>h</sup> *when* <sup>g</sup> <sup>h</sup> *and* <sup>g</sup> <sup>≡</sup> <sup>i</sup>*. It is* closed under embeddings *if* <sup>h</sup> <sup>i</sup> *entails* <sup>g</sup>{<sup>h</sup> <sup>←</sup> <sup>i</sup>} <sup>g</sup>*. A* congruence *is an equivalence relation on* L*-graphs that is closed under isomorphisms and embeddings.*

**Definition 18.** *A congruence* <sup>∼</sup> validates *an expression* E *(written* ∼|<sup>=</sup> E*) iff one of the following conditions holds: (i)* E *is an equation* <sup>g</sup> <sup>≈</sup> <sup>h</sup> *and* <sup>g</sup> <sup>∼</sup> <sup>h</sup>*; (ii)* E *is a literal* <sup>g</sup> ≈ <sup>h</sup> *and* <sup>g</sup> ∼ <sup>h</sup>*; (iii)* E *is a clause* C *and* <sup>∼</sup> *validates at least one literal in* C*; (iv)* E *is a set of clauses* <sup>Γ</sup> *and* <sup>∼</sup> *validates all the clauses in* Γ*. A congruence* <sup>∼</sup> *is a* model *of* E *if* ∼|<sup>=</sup> E*. An expression is* satisfiable *if it admits a model and* unsatisfiable *otherwise. A* tautology *is a clause that is true in all congruences.*

## **4 Superposition Calculus with Uninterpreted Labels**

We define a superposition calculus for testing the satisfiability of sets of clauses. This calculus is *strict* (see, e.g., [2]) in the sense that it does not use the equational factorization rule (as defined in [1]), but uses instead the standard factorization rule that unifies both members of two equations. This choice is motivated by the fact that, as shown in Example 22, graph superposition is not compatible with tautology deletion (except when the clauses are Horn). Since tautology deletion is disabled for non-Horn clauses, equational factorization is not needed anyway. Selection functions are not considered, since they are not compatible with the redundancy criterion.

The usual superposition calculus [1] is parameterized by a *reduction order*, i.e., an order on terms that is well-founded, total on ground terms, and closed under substitutions and embeddings. In the case of L-graphs, no such order possibly exists, if we also add the natural requirement that the order must be closed under renamings, as evidenced by the following example:

*Example 19.* Assume that an order < exists, satisfying the following properties: < is well-founded, closed under isomorphisms and embeddings, and total up to isomorphism (i.e., if g -≡ h then either g < h or h < g). Consider the L-graphs g and h with roots (ρ1, ρ2, ρ3, ρ4) and containing no labels, as well as the L-graphs i, j with an empty sequence of roots, where all nodes are labeled by 0:

It is clear that g -≡ h. Indeed, if μ(g) = h holds for some N -renaming μ, then μ(Rg) = Rh, i.e., μ((ρ1, ρ2, ρ3, ρ4)) = (ρ1, ρ2, ρ3, ρ4), which entails that μ is the identity on these nodes. Thus we cannot have μ(Eg) = Eh, as the first root (ρ1) is connected to the third root (ρ3) in g and to the fourth one (ρ4) in h. Consequently, we have either g < h or h < g. Now we also have g ≤<sup>g</sup> i and h ≤<sup>g</sup> j, and it is easy to check that i{g ← h} = j and j{h ← g} = i. Thus we have either i < j or j < i. But since R<sup>i</sup> = R<sup>j</sup> = () we have i ≡ j: indeed, if μ(ρ1) = ρ1, μ(ρ2) = ρ2, μ(ρ3) = ρ<sup>4</sup> and μ(ρ4) = ρ3, then μ(i) = j.

We thus slightly relax the requirement of having a reduction order, and consider instead a pre-order < on L-graphs, that is well-founded, closed under isomorphisms and embeddings, and contains ≤g. We write g < h if g ≤ h and h -≤ g, and we write g h if g ≤ h and h ≤ g. We also assume that the equivalence classes of are finite, up to isomorphism. It is clear that such pre-orders exist, for instance, the pre-order: <sup>g</sup> <sup>≤</sup> <sup>h</sup> ⇐⇒ *card*(Ng) <sup>≤</sup> *card*(Nh) fulfills the above properties.

Similarly to the usual superposition calculus, we associate every literal L with a multiset defined as follows: *mset*(<sup>g</sup> -<sup>≈</sup> <sup>h</sup>) def <sup>=</sup> *{{*g, <sup>h</sup>*}}* and *mset*(<sup>g</sup> <sup>≈</sup> <sup>h</sup>) def = *{{*g*}*, *{*h*}}*. For every clause <sup>C</sup> <sup>=</sup> <sup>L</sup>1∨···∨Ln, we define: *mset*(C) def <sup>=</sup> *{mset*(Li) <sup>|</sup> i = 1,...,n*}*. Any order or preorder on L-graphs may then be extended into an order on clauses as follows: CD ⇐⇒ *mset*(C) m *mset*(D), where m denotes the multiset extension of (note that m is also a (pre)order). A literal L is <*-maximal* in a clause C if there is no literal L- ∈ C such that L- > L. An Lgraph <sup>g</sup> is <sup>&</sup>lt;*-maximal* in a literal <sup>L</sup> if <sup>L</sup> contains no <sup>L</sup>-graph <sup>g</sup> such that g-> g. A literal L is *eligible* in a clause C if L is a <-maximal literal in C. Intuitively, eligible literals are those that may be considered for performing inferences. For instance, given a clause (<sup>g</sup> <sup>≈</sup> <sup>h</sup>)∨(<sup>i</sup> <sup>≈</sup> <sup>j</sup>), if (<sup>g</sup> <sup>≈</sup> <sup>h</sup>) <sup>&</sup>gt; (<sup>i</sup> <sup>≈</sup> <sup>j</sup>), then <sup>g</sup> <sup>≈</sup> <sup>h</sup> is eligible but not i ≈ j. Consequently the inference rules (as defined in Section 4.1) will be allowed to replace <sup>g</sup> by <sup>h</sup> using the equation <sup>g</sup> <sup>≈</sup> <sup>h</sup> (provided <sup>g</sup> <sup>&</sup>lt; <sup>h</sup>) but not, e.g., i by j (this restricts the number of inferences and prune the search space). Non eligible literals are simply attached to the conclusion of the inference but they play no active role until they (eventually) become eligible.

## **4.1 Inference Rules**

The Superposition calculus SC is defined by the following rules: Sp<sup>+</sup> (positive superposition), Sp<sup>−</sup> (negative superposition), R (Reflection) and F (Factoring). The rules and their side conditions are very similar to those of the usual (ground) superposition calculus, except for the use of the merging operation for positive superposition. To simplify notations, the rules are defined modulo isomorphims, which means that one has to find a renaming of the premises such that the considered rule applies (this can be done using standard algorithms for finding graph homomorphisms). For instance, with this convention, the Reflection rule R actually removes all equations of the form g ≈ h, with g ≡ h.

$$\mathsf{Sp}^+ : \frac{\mathfrak{g}\_1 \approx \mathfrak{h}\_1 \lor C\_1 \quad \mathfrak{g}\_2 \approx \mathfrak{h}\_2 \lor C\_2}{\mathfrak{i}\{\mathfrak{g}\_1 \leftarrow \mathfrak{h}\_1\} \approx \mathfrak{i}\{\mathfrak{g}\_2 \leftarrow \mathfrak{h}\_2\} \lor C\_1 \lor C\_2}$$

where:


The non-orthogonality condition is the analogous of the non-variable condition of the usual calculus, it dismisses trivial replacements.

$$\mathsf{Sp}^{-}: \frac{\mathfrak{g} \approx \mathfrak{h} \lor C \quad \mathsf{i} \not\simeq \mathsf{j} \lor D}{\mathsf{i}\{\mathfrak{g} \leftarrow \mathfrak{h}\} \not\simeq \mathsf{j} \lor C \lor D}$$

where:


$$\mathsf{F}: \frac{\mathfrak{g} \approx \mathfrak{h} \lor \mathfrak{g} \approx \mathfrak{h} \lor C}{\mathfrak{g} \approx \mathfrak{h} \lor C} \qquad \text{if } \mathfrak{g} \approx \mathfrak{h} \text{ is elliptic in } \mathfrak{g} \approx \mathfrak{h} \lor \mathfrak{g} \approx \mathfrak{h} \lor C.$$

$$\mathsf{R}: \frac{\mathfrak{g} \not\simeq \mathfrak{g} \lor C}{C} \qquad \text{if } \mathfrak{g} \not\simeq \mathfrak{g} \text{ is elliptic in } \mathfrak{g} \not\simeq \mathfrak{g} \lor C.$$

**Lemma 20.** *The rules* Sp<sup>+</sup>*,* Sp−*,* F *and* R *are sound, i.e., for all congruences* <sup>∼</sup> *and for all clauses* <sup>C</sup> *deducible from a set of premises* <sup>Γ</sup>*, we have* ∼|<sup>=</sup> <sup>Γ</sup> <sup>=</sup><sup>⇒</sup> ∼|<sup>=</sup> <sup>C</sup>*.*

#### **4.2 Redundancy**

In the usual superposition calculus [1], a clause is redundant if all its ground instances are entailed by smaller clauses (w.r.t. the considered order). Such clauses can be deleted without threatening refutational completeness, which reduces the search space. In our context, such a definition cannot be used, because one of the inference rules –namely Sp<sup>+</sup>– may generate clauses that are strictly larger than the premises (hence such clauses would be considered as redundant if the usual criterion were to be used).

*Example 21.* Consider the clauses: <sup>g</sup> <sup>≈</sup> <sup>h</sup> and <sup>i</sup> <sup>≈</sup> <sup>j</sup>, where <sup>g</sup>, <sup>h</sup>, <sup>i</sup>, <sup>j</sup> are <sup>L</sup>-graphs with root (ρ1) that are defined as follows:

$$\mathfrak{g} \colon \widehat{\bigodot\_{1}} \widehat{\huge{0}} \bigrightharpoonup \widehat{\huge{0}} \quad \mathfrak{h} \colon \widehat{\bigodot\_{1}} \bigrightharpoonup} \widehat{\huge{1}} \quad \mathfrak{i} \colon \widehat{\bigodot\_{1}} \bigrightharpoonup \widehat{\huge{0}} \quad \mathfrak{j} \colon \widehat{\bigodot\_{1}} \bigrightharpoonup \widehat{\huge{0}}$$

The <sup>L</sup>-graphs <sup>g</sup> and <sup>i</sup> admit the following merge (of root (ρ1)): <sup>0</sup> <sup>ρ</sup><sup>1</sup> <sup>0</sup> Therefore, rule Sp<sup>+</sup> applies, yielding g- ≈ g--, where:

$$\mathfrak{g}' \colon \widehat{\mathbb{O}} \vdash \widehat{\mathbb{O}\_1} \bigcap \widehat{\mathbb{O}} \qquad \mathfrak{g''} \colon \widehat{\mathbb{O}} \quad \widehat{\mathbb{O}\_1} \vdash \widehat{\mathbb{O}}$$

If L-graphs are ordered according to their number of nodes, then we have (g- ≈ g--) > (g ≈ h) and (g- ≈ g--) > (i ≈ j).

Worse, the calculus is actually incomplete if tautologies are deleted, as shown in the following example.

*Example 22.* Consider the <sup>L</sup>-graphs <sup>g</sup>1, <sup>g</sup><sup>2</sup> and <sup>g</sup><sup>3</sup> with roots (ρ1, ρ2, ρ3):

Let g˙ <sup>i</sup> denote the graph obtained from g<sup>i</sup> by adding one additional non root node α distinct from ρ1, ρ2, ρ3, with some arbitrary (but fixed) label, e.g., 0. Assume that the graphs are ordered by the number of nodes, so that g˙ <sup>i</sup> > g<sup>j</sup> , g˙ <sup>i</sup> g˙ <sup>j</sup> and g<sup>i</sup> g<sup>j</sup> (for all i, j ∈ {1, 2, 3}). Let Γ = {g˙ <sup>1</sup> ≈ g<sup>2</sup> ∨ g˙ <sup>2</sup> ≈ g<sup>3</sup> ∨ g˙ <sup>3</sup> ≈ g1, g˙ <sup>1</sup> ≈ g<sup>2</sup> ∨ g˙ <sup>2</sup> ≈ g<sup>3</sup> ∨ g˙ <sup>3</sup> ≈ g1}. Intuitively, every equation g˙ <sup>i</sup> ≈ g<sup>j</sup> where (i, j) ∈ {(1, 2),(2, 3),(3, 1)} states that the semantics of the graph is preserved when the isolated node is deleted and the graph is rotated by 90 degrees clockwise, for each possible position of the loop. Since the graphs are invariant by rotation, all these transformations are actually equivalent. It is easy to check that every clause that can be generated from Γ by applying the negative superposition rule from the first clause into the second clause contains two complementary literals (i.e. two literals of the form g˙ <sup>i</sup> ≈ g<sup>j</sup> and g˙ <sup>i</sup> ≈ g<sup>j</sup> ) hence is a tautology. Moreover, the clauses obtained by superposition using the first clause only either are subsumed by the first clause (if the superposition rule is applied on two different literals) or contains a literal g<sup>i</sup> ≈ g<sup>i</sup> (hence is a tautology). The equational factorization rule (as defined in [1]) does not apply since g˙ <sup>i</sup> and g˙ <sup>j</sup> are not isomorphic if i = j. However, consider the L-graphs g- <sup>i</sup>, g˙- i which contain the same nodes and edges as g<sup>i</sup> and g˙ <sup>i</sup> respectively, but with roots (ρ2, ρ3, ρ1). It is clear that g- <sup>2</sup> ≡ g<sup>1</sup> and g- <sup>3</sup> ≡ g2, so that g˙ <sup>1</sup> ≈ g<sup>2</sup> |= g˙- <sup>2</sup> ≈ g- 3. However, g˙- <sup>2</sup> <sup>≤</sup><sup>g</sup> <sup>g</sup>˙ <sup>2</sup> and <sup>g</sup>˙ <sup>2</sup>{g˙- <sup>2</sup> ← g- <sup>3</sup>} = g3, thus g˙ <sup>1</sup> ≈ g<sup>2</sup> |= g˙ <sup>2</sup> ≈ g3. By a similar reasoning, we may show that g˙ <sup>2</sup> ≈ g<sup>3</sup> |= g˙ <sup>3</sup> ≈ g<sup>1</sup> and g˙ <sup>3</sup> ≈ g<sup>1</sup> |= g˙ <sup>1</sup> ≈ g2, so that the equations g˙ <sup>1</sup> ≈ g2, g˙ <sup>2</sup> ≈ g3, and g˙ <sup>3</sup> ≈ g1, are actually pairwise equivalent, which entails that Γ is unsatisfiable. However, cannot be derived from Γ if the clauses containing complementary literals are discarded.

Thus, the conditions that ensure that a clause is redundant must be stronger than those of the usual superposition calculus. The definition proposed below covers usual deletion rules such as subsumption. Actually, two different criteria will be used, namely non-strict and strict redundancy, depending on whether the considered clauses are Horn or not. Indeed, in the former case a slightly less restrictive definition can be used, which permits the deletion of (some) tautological clauses.

**Definition 23.** *Let* C, D *be two clauses and let* Γ *be a set of clauses. We say that* <sup>C</sup> *is* subsumed *by* <sup>D</sup> *and write* <sup>C</sup> <sup>≥</sup>sub <sup>D</sup> *if* <sup>C</sup> <sup>=</sup> <sup>D</sup> <sup>∨</sup> <sup>C</sup>- *, up to associativity and commutativity of* ∨ *and isomorphism. We write* C →<sup>Γ</sup> D *(*C demodulates *to* D *w.r.t.* Γ*) if* C *is of the form* g h∨E *(with* ∈ {≈, ≈}*),* D = g{i ← j} h∨E*, and there exists a clause* F ∈ Γ *such that* F = (i ≈ j) ∨ F- *, with* F- <sup>≤</sup>sub <sup>E</sup>*,* i > j*,* F-< (i ≈ j) *and* (i ≈ j) < (g h)*.*

*The set of clauses that are redundant w.r.t. a set of clauses* Γ *is defined inductively as follows. A clause* C *is* redundant *w.r.t.* Γ *iff one of the following conditions holds: (1)* C *contains two literals* g<sup>1</sup> ≈ g<sup>2</sup> *and* g- <sup>1</sup> ≈ g- <sup>2</sup>*, with* g<sup>i</sup> ≡ g- <sup>i</sup> *for* <sup>i</sup> = 1, <sup>2</sup>*; (2)* <sup>C</sup> *contains a literal of the form* <sup>g</sup> <sup>≈</sup> <sup>h</sup> *with* <sup>g</sup> <sup>≡</sup> <sup>h</sup>*; (3)* <sup>C</sup> <sup>≥</sup>sub <sup>D</sup>*, for some* D ∈ Γ*; (4)* C →<sup>Γ</sup> D *and* D *is redundant. The set of* strictly redundant *ground clauses is defined in a similar way, except that Item 1 is removed.*

Intuitively, the conditions ensuring that C demodulates to D in Definition 23 are meant to ensure that D may be deduced from C by applying the rule Sp<sup>+</sup> or Sp<sup>−</sup> using the clause F (with D<C and F <C) and that {D}∪Γ is equivalent to {C} ∪ Γ. In particular, the condition F- <sup>≤</sup>sub <sup>E</sup> ensures that all the literals added by the inference already occur in C.

**Definition 24.** *A set of clauses* Γ *is* saturated *(resp.* strictly saturated*) if every clause that can be deduced from premises in* Γ *using one of the rules of* SC *(in one step) is redundant (resp. strictly redundant) w.r.t.* Γ*.*

We prove that SC is refutationally complete. We actually establish two completeness results, the first one for general clauses and the second one for Horn clauses. The latter is stronger since it uses the weaker non-strict saturatedness criterion instead of strict saturatedness.

**Theorem 25.** *Let* Γ *be a set of clauses. If* - -∈ Γ *and* Γ *is strictly saturated or both Horn and saturated then* Γ *is satisfiable.*

## **5 A Constrained Graph Superposition Calculus**

We now lift the calculus SC defined in Section 4 into a constrained calculus. The goal is to handle graphs labeled by terms interpreted in some arbitrary theory, and possibly containing variables. To this aim, we attach constraints to the clauses, which are formulas interpreted in the considered theory, asserting conditions on the labels. Such constraints will be updated when inference rules will be applied, by asserting the conditions that are required by the rule applications.

## **5.1 Constrained Clauses**

Let V be a countably infinite set of *variables* and let Σ be a set of *function symbols*<sup>1</sup>. Each symbol f in Σ is associated with a unique *arity* #(f). We denote by T the set of *terms* built inductively as usual on V and Σ, and by C the set of first-order formulas, called *constraints*, built inductively as usual on atoms of the form <sup>t</sup> . = s, where t, s ∈ T using the logical connectives ∨, ∧, ¬, ⇒, ⇔, the quantifiers ∃, ∀ and two logical constants ⊥ and .

A *substitution* σ is a function mapping all variables x to a term xσ. The *domain dom*(σ) of σ is the set of variables x such that xσ -= x. For every term or formula e, we denote by eσ the term or formula obtained from e by replacing every (free) variable x by xσ. A term is *ground* if it contains no variables, and a substitution σ is *ground* if xσ is ground for all x ∈ *dom*(σ).

T *-graph*s are L-graphs with labels in T . A T *-clause* is a clause defined on T -graphs. Substitutions are extended to T -graphs and T -clauses as follows. For every T -graph g, we denote by gσ the T -graph such that: F<sup>g</sup><sup>σ</sup> = F<sup>g</sup> for all <sup>F</sup> ∈ {N,E,R} and <sup>L</sup><sup>g</sup>σ(α) = <sup>L</sup>g(α)σ, for all <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup>. Then: (<sup>g</sup> <sup>≈</sup> <sup>h</sup>)<sup>σ</sup> def = gσ ≈ hσ, (g -<sup>≈</sup> <sup>h</sup>)<sup>σ</sup> def = gσ -<sup>≈</sup> <sup>h</sup><sup>σ</sup> and (<sup>C</sup> <sup>∨</sup> <sup>D</sup>)<sup>σ</sup> def = Cσ ∨ Dσ. A T -graph g is *ground* if for all <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup>, Lg(α) is ground. A T -clause is *ground* if all the T -graphs occurring in it are ground. For every expression (term, T -graph, constraint or T -clause) E, we denote by V(E) the set of variables (freely) occurring in E.

**Definition 26.** *A* constrained clause *(or c-clause) is a pair* [C | φ]*, where* C *is a* T *-clause and* φ ∈ C*.*

Let I be some fixed set of first-order interpretations on the signature Σ. For all <sup>I</sup> ∈ I, we denote by *dom*(I) the domain of <sup>I</sup> and by <sup>f</sup> <sup>I</sup> the interpretation of the function f (with f ∈ Σ). For every ground term t and for all I ∈ I, we denote by [t] <sup>I</sup> the value of t in I, inductively defined as usual. To simplify

<sup>1</sup> As usual, predicates may be encoded as functions.

notations, we assume that for every I ∈ I and for every e ∈ *dom*(I), there exists a ground term t such that [t] <sup>I</sup> = e.

The satisfiability relation |= relating interpretations in I and constraints in <sup>C</sup> is defined as usual, where . = is interpreted as the identity, and ⊥ and are interpreted as false and true, respectively. We write φ |=<sup>I</sup> ψ if the implication I |= φσ =⇒ I |= ψσ holds for all I ∈ I and for all ground substitutions of domain V(φ) ∪ V(ψ); and φ ≡<sup>I</sup> ψ iff φ |=<sup>I</sup> ψ and ψ |=<sup>I</sup> φ. For any set of constraints, we write I |= S iff I |= φ for all φ ∈ S. For any constraint (or set of constraints) φ, if there exists a ground substitution σ with domain V(φ) and an interpretation I ∈ I such that I |= φσ, then φ is I*-satisfiable* (and I*-unsatisfiable* otherwise). For instance, the fixed set of first-order interpretations may be the set I<sup>1</sup> of first-order interpretations on Σ that satisfy the above condition on the domain (this is not restrictive provided there are infinitely many ground terms), in which case I-satisfiability is simply the standard satisfiability in firstorder clausal logic, or the set <sup>I</sup><sup>N</sup> of interpretations of domain <sup>N</sup> interpreting the functions 0, 1, + as usual. We say that I is *compact* if for every I-unsatisfiable set of constraints S there exists a finite set S- ⊆ S such that S is I-unsatisfiable. It is well-known that <sup>I</sup><sup>1</sup> is compact [21] and that <sup>I</sup><sup>N</sup> is not compact<sup>2</sup>.

Any ground T -graph may be transformed into a *dom*(I)-graph by replacing the labels by their interpretations in I. More formally:

**Definition 27.** *For all* <sup>I</sup> ∈ I *and for all ground* <sup>T</sup> *-graphs* <sup>g</sup> *we denote by* [g] I *the graph such that* <sup>F</sup>[g]<sup>I</sup> <sup>=</sup> <sup>F</sup><sup>g</sup> *for all* <sup>F</sup> ∈ {N,E,R} *and* <sup>L</sup>[g]<sup>I</sup> (α)=[Lg(α)]<sup>I</sup> *, for all* <sup>α</sup> <sup>∈</sup> <sup>N</sup><sup>g</sup>*. For every ground* T *-clause* C*, we denote by* [C] <sup>I</sup> *the clause obtained from* C *by replacing every* T *-graph* g *by* [g] <sup>I</sup> *. For all sets of c-clauses* Γ*, we denote by* [Γ] <sup>I</sup> *the set of clauses of the form* [Cσ] <sup>I</sup> *, where* <sup>C</sup> <sup>∈</sup> <sup>Γ</sup> *and* <sup>σ</sup> *is a substitution mapping every variable in* C *to a ground term.*

Note that by definition, all the labels of [g] <sup>I</sup> are elements of the domain of I. Proposition 28 follows immediately from Definition 27.

**Proposition 28.** *Let* <sup>g</sup>, <sup>h</sup> *be* <sup>T</sup> *-graphs, let* <sup>I</sup> ∈ I *and let* <sup>σ</sup> *be a ground substitution with domain* V(g) ∪ V(h)*. If* g ≡ h *then* [gσ] <sup>I</sup> <sup>≡</sup> [hσ] I *.*

**Definition 29.** *An* <sup>I</sup>-interpretation *is a pair* (I, <sup>∼</sup>)*, where* <sup>I</sup> ∈ I *and* <sup>∼</sup> *is a congruence on dom*(I)*-graphs. An* I*-interpretation* (I, ∼) validates *a set of c-clauses* Γ *(written* (I, ∼) |= Γ*) if* ∼|= [Γ] I *.*

## **5.2 Lifting the Calculus**

In the constrained calculus, the equality of labels will not be checked when an inference rule is applied. Instead, the corresponding conditions will be extracted from the considered graphs and added to the constraints of the conclusion. We first introduce a relation stating that two T -graphs are identical up to their

<sup>2</sup> For instance, the set {<sup>n</sup> . <sup>=</sup> <sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup>} is unsatisfiable if <sup>n</sup> is interpreted as a natural number, but admits no finite unsatisfiable subset.

labels. This relation is parameterized by a constraint that asserts conditions on the labels ensuring that the graphs are identical (modulo I).

**Definition 30.** *Let* g, h *be two* T *-graphs and let* φ ∈ C*. We write* g =<sup>φ</sup> h *if* N<sup>g</sup> = Nh*,* E<sup>g</sup> = Eh*,* R<sup>g</sup> = Rh*, and* φ = - α∈N<sup>g</sup> (Lg(α) . = Lh(α)) *(up to associativity and commutativity of* ∧*).*

*Example 31.* Consider the T -graphs g and h below, of root (ρ1). We have g =<sup>φ</sup> h, with <sup>φ</sup> = (<sup>x</sup> . = 0 <sup>∧</sup> <sup>0</sup> . = y).

Every relation between T -graphs or T -clauses may be adapted in a similar way, keeping the conditions on the nodes, edges and roots, and asserting conditions ensuring that the label of every given node is unique (up to equality modulo I). Definitions 32 and 33 lift the subgraph and subsumption relations, respectively:

**Definition 32.** *We write* <sup>h</sup> <sup>≤</sup><sup>g</sup> <sup>φ</sup> g *if* N<sup>h</sup> ⊆ Ng*;* E<sup>h</sup> Eg*; every node* α ∈ N<sup>h</sup> *occurring in* R<sup>g</sup> *also occurs in* Rh*; if* α ∈ N<sup>h</sup> *occurs in an edge in* E<sup>g</sup> \ E<sup>h</sup> *then* α ∈ Rh*, and* φ = - α∈N<sup>h</sup> <sup>L</sup>h(α) . = Lg(α)*. The notation* g{h ← i} *may be extended to the case where* <sup>h</sup> <sup>≤</sup><sup>g</sup> <sup>φ</sup> g *(following Definition 6). Orthogonality is extended accordingly (as it does not depend on labels).*

**Definition 33.** *We write* <sup>C</sup> <sup>≤</sup>sub <sup>φ</sup> D *if* C *and* D *are respectively of the form (up to associativity and commutativity of* <sup>∨</sup> *and isomorphism):* <sup>n</sup> <sup>i</sup>=1 g<sup>i</sup> <sup>i</sup> hi*, and* <sup>n</sup> <sup>i</sup>=1 g <sup>i</sup> <sup>i</sup> h <sup>i</sup> ∨ D *, with* g<sup>i</sup> =<sup>φ</sup><sup>i</sup> g <sup>i</sup>*,* h<sup>i</sup> =<sup>ψ</sup><sup>i</sup> h <sup>i</sup> *(for all* i = 1,...,n*) and* φ = n <sup>i</sup>=1(φ<sup>i</sup> ∧ ψi)*.*

The notion of a merge is extended analogously:

**Definition 34.** *A* φ-merge *of two* T *-graphs* g<sup>1</sup> *and* g<sup>2</sup> *is a* T *-graph* h *such that:* **–** N<sup>h</sup> = N<sup>g</sup><sup>1</sup> ∪ N<sup>g</sup><sup>2</sup> *,* E<sup>h</sup> = E<sup>g</sup><sup>1</sup> <sup>E</sup><sup>g</sup><sup>2</sup> *, and* <sup>N</sup><sup>h</sup> <sup>=</sup> <sup>N</sup><sup>g</sup><sup>1</sup> <sup>∪</sup> <sup>N</sup><sup>g</sup><sup>2</sup> *.*


We now lift the order relation. Let ≤<sup>I</sup> (for I ∈ I) be a family of wellfounded preorders on *dom*(I)-T -graphs that are closed under isomorphisms and embeddings and contain <sup>≤</sup><sup>g</sup>. Let <sup>≤</sup><sup>φ</sup> (for <sup>φ</sup> ∈ C) be a family of pre-orders on T -graphs satisfying the following conditions: g ><sup>φ</sup> h =⇒ g ><sup>ψ</sup> h, for all constraints φ, ψ such that ψ |=<sup>I</sup> φ, and (I |= φ ∧ g ><sup>φ</sup> h) =⇒ [g] <sup>I</sup> ><sup>I</sup> [h] <sup>I</sup> . The simplest solution in practice is to order T -graphs according to their number of nodes, in which case the order does not depend on I or φ: g ≤<sup>I</sup> h ⇐⇒ g ≤<sup>φ</sup>

<sup>h</sup> ⇐⇒ *card*(Ng) <sup>≤</sup> *card*(Nh). However, our framework is meant to be general enough to cope with orders that take labels into account.

A literal <sup>L</sup> is *maximal* in a c-clause [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] if there is no literal <sup>L</sup>- <sup>∈</sup> <sup>C</sup> such that L-<sup>&</sup>gt;<sup>φ</sup> <sup>L</sup>. It is *eligible* in a c-clause [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] if <sup>L</sup> is a <sup>&</sup>gt;φ-maximal literal in <sup>C</sup>.

We are now in the position to define the constrained inference rules. As for the rules in Section 4.1, they apply modulo isomorphism. We assume as for the standard resolution or superposition calculus that the premises share no variables. In every rule, the conclusion inherits the constraints of the premises together with additional conditions on the labels which makes the inference valid. In all rules, the eligibility condition is tested after adding all the constraints enabling the inference, as this yields the most restrictive condition, thus reducing the branching factor.

$$\mathsf{Sp}^{+}: \frac{[\mathfrak{g}\_{1} \approx \mathfrak{h}\_{1} \lor C\_{1} \mid \phi\_{1}]}{[\mathfrak{i}\{\mathfrak{g}\_{1} \leftarrow \mathfrak{h}\_{1}\} \approx \mathfrak{i}\{\mathfrak{g}\_{2} \leftarrow \mathfrak{h}\_{2}\} \lor C\_{1} \lor C\_{2} \mid \phi\_{1} \land \phi\_{2} \land \psi]}$$

where:


$$\mathsf{Sp}^{-}:\frac{[\mathfrak{g}\approx\mathfrak{h}\vee C\mid\phi]}{[\mathfrak{i}\{\mathfrak{g}\leftarrow\mathfrak{h}\}\not\cong\mathfrak{j}\vee C\vee D\mid\phi\wedge\psi\wedge\xi]}$$

where:


$$\mathbf{F} : \frac{[\mathfrak{g} \approx \mathfrak{h} \lor \mathfrak{g}' \approx \mathfrak{h}' \lor C \mid \phi]}{[\mathfrak{g} \approx \mathfrak{h} \lor C \mid \phi \land \psi \land \psi']}$$

where g ≈ h is eligible in [g ≈ h∨g- ≈ h- <sup>∨</sup> <sup>C</sup> <sup>|</sup> <sup>φ</sup>∧<sup>ψ</sup> <sup>∧</sup>ψ- ], g =<sup>ψ</sup> g- , and h =<sup>ψ</sup> h- .

$$\mathbb{R}: \frac{[\mathfrak{g} \not\simeq \mathfrak{h} \lor C \mid \phi]}{[C \mid \phi \land \psi]}$$

where g ≈ h is eligible in [g <sup>≈</sup> <sup>h</sup> <sup>∨</sup> <sup>C</sup> <sup>|</sup> <sup>φ</sup> <sup>∧</sup> <sup>ψ</sup>] and <sup>g</sup> <sup>=</sup><sup>ψ</sup> <sup>h</sup>.

## **5.3 Soundness and Refutational Completeness**

We establish the soundness and completeness of the constrained calculus, by lifting the corresponding properties for the base calculus. Note that semi decidability holds only if the base theory is semi-decidable<sup>3</sup> and compact (otherwise it is easy to see that unsatisfiability is not semi-decidable in general).

<sup>3</sup> in the sense that there exists a semi-decision procedure to check whether a formula in C is unsatisfiable.

**Lemma 35.** *The rules* Sp<sup>+</sup>*,* Sp−*,* F *and* R *(applied on c-clauses) are sound, i.e., for all* <sup>I</sup>*-interpretations* (I, <sup>∼</sup>) *and for all c-clauses* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] *deducible for a set of premises* <sup>Γ</sup>*, we have* (I, <sup>∼</sup>) <sup>|</sup><sup>=</sup> <sup>Γ</sup> <sup>=</sup><sup>⇒</sup> (I, <sup>∼</sup>) <sup>|</sup>= [<sup>C</sup> <sup>|</sup> <sup>φ</sup>]*.*

The redundancy criterion may be lifted as follows:

**Definition 36.** *A c-clause* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] *is (strictly)* <sup>I</sup>-redundant *in a set of c-clauses* <sup>Γ</sup> *if for all ground substitutions* <sup>σ</sup> *of domain* <sup>V</sup>(C)∪ V(φ) *and for all* <sup>I</sup> ∈ I *such that* <sup>I</sup> <sup>|</sup><sup>=</sup> φσ*, the clause* [Cσ] <sup>I</sup> *is (strictly) redundant in* [Γ] I *.*

*A set of c-clauses* Γ *is (strictly)* saturated *if every c-clause that is deducible from* <sup>Γ</sup> *by the rules above is (strictly)* <sup>I</sup>*-redundant in* <sup>Γ</sup>*.*

**Theorem 37.** *Let* Γ *be a set of c-clauses. If* Γ *is unsatisfiable and strictly saturated or Horn and saturated, then* <sup>Γ</sup> *contains a set of c-clauses* {[- <sup>|</sup> <sup>φ</sup><sup>I</sup> ] <sup>|</sup> <sup>I</sup> ∈ I} *such that for every* <sup>I</sup> ∈ I*,* <sup>I</sup> <sup>|</sup><sup>=</sup> <sup>∃</sup>*x*<sup>I</sup> .φi*, with <sup>x</sup>*<sup>I</sup> <sup>=</sup> <sup>V</sup>(φ<sup>I</sup> )*. If, moreover,* <sup>I</sup> *is compact, then* <sup>Γ</sup> *contains a finite set of c-clauses* {[- <sup>|</sup> <sup>φ</sup>i] <sup>|</sup> <sup>i</sup> = 1,...,n} *such that* n <sup>i</sup>=1 <sup>¬</sup>(∃*x*.φi) *is* <sup>I</sup>*-unsatisfiable, with <sup>x</sup>*<sup>i</sup> <sup>=</sup> <sup>V</sup>(φi)*.*

#### **5.4 Redundancy Testing**

The redundancy criterion in Definition 36 is very general, but it may be difficult to test in practice. We thus introduce a second notion of redundancy, defined directly on constrained clauses, that is stronger and easier to decide.

**Definition 38.** *Let* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>], [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *be two clauses and let* <sup>Γ</sup> *be a set of clauses. Let <sup>x</sup> and <sup>y</sup> be the vectors of variables occurring in* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] *and* [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>]*, respectively (we assume by renaming that x and y share no variable).*

*We say that* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] *is* subsumed *by* [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *and we write* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] <sup>≥</sup>sub [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *if there exists* <sup>ξ</sup> ∈ C *such that* <sup>D</sup> <sup>≤</sup>sub <sup>ξ</sup> <sup>C</sup> *and* <sup>φ</sup> <sup>|</sup>=<sup>I</sup> <sup>∃</sup>*y*.(<sup>ψ</sup> <sup>∧</sup> <sup>ξ</sup>)*.*

*We write* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] <sup>→</sup><sup>Γ</sup> [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *(*[<sup>C</sup> <sup>|</sup> <sup>φ</sup>] demodulates *to* [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *w.r.t.* <sup>Γ</sup>*) if* <sup>C</sup> *is of the form* <sup>g</sup> <sup>h</sup> <sup>∨</sup> <sup>E</sup>*,* <sup>D</sup> <sup>=</sup> <sup>g</sup>{<sup>i</sup> <sup>←</sup> <sup>j</sup>} <sup>h</sup> <sup>∨</sup> <sup>E</sup>*, and there exists a c-clause* [<sup>F</sup> <sup>|</sup> <sup>ξ</sup>] <sup>∈</sup> <sup>Γ</sup> *(with free variables <sup>z</sup>) such that* <sup>F</sup> = (<sup>i</sup> <sup>≈</sup> <sup>j</sup>)∨F- *,* <sup>i</sup> <sup>≤</sup><sup>g</sup> ξ g*,* F- <sup>≤</sup>sub ξ-- E*,* <sup>φ</sup> <sup>|</sup>=<sup>I</sup> <sup>∃</sup>*y*.∃*z*.(<sup>ψ</sup> <sup>∧</sup> <sup>ξ</sup> <sup>∧</sup> <sup>ξ</sup>- <sup>∧</sup> <sup>ξ</sup>--)*,* i ><sup>ξ</sup> j*,* F-<sup>&</sup>lt;<sup>ξ</sup> (<sup>i</sup> <sup>≈</sup> <sup>j</sup>) *and* (<sup>i</sup> <sup>≈</sup> <sup>j</sup>) <sup>&</sup>lt;<sup>ξ</sup> (<sup>g</sup> <sup>h</sup>)*.*

*A c-clause* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] *is* redundant *w.r.t.* <sup>Γ</sup> *iff one of the following conditions holds: (1)* <sup>∃</sup>*x*.φ *is* <sup>I</sup>*-unsatisfiable, with <sup>x</sup>* <sup>=</sup> <sup>V</sup>(φ)*. (2)* <sup>C</sup> *contains two literals* g<sup>1</sup> ≈ g<sup>2</sup> *and* g- <sup>1</sup> ≈ g- <sup>2</sup>*, with* g<sup>i</sup> =<sup>φ</sup><sup>i</sup> g- <sup>i</sup>*, and* <sup>φ</sup> <sup>|</sup>=<sup>I</sup> <sup>φ</sup><sup>i</sup> *(for all* <sup>i</sup> = 1, <sup>2</sup>*); (3)* <sup>C</sup> *contains a literal of the form* <sup>g</sup> <sup>≈</sup> <sup>h</sup> *with* <sup>g</sup> <sup>=</sup><sup>ψ</sup> <sup>h</sup> *and* <sup>φ</sup> <sup>|</sup>=<sup>I</sup> <sup>ψ</sup>*; (4)* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] <sup>≥</sup>sub [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>]*, for some* [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] <sup>∈</sup> <sup>Γ</sup>*; (5)* [<sup>C</sup> <sup>|</sup> <sup>φ</sup>] <sup>→</sup><sup>Γ</sup> [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *and* [<sup>D</sup> <sup>|</sup> <sup>ψ</sup>] *is redundant. The notion of strictly redundant c-clause is defined in a similar way, removing Item 2.*

*Example 39.* Consider the following T -graphs, of root ():

$$\underbrace{0\cdot}\_{\bullet}\widehat{\circ}\_{\circ:x}\bigleftarrow\underset{\circ:y}{\bigfrown}\_{\circ:y}\qquad\underset{\bullet}{\bullet:\bigcirc}\underset{\circ:z}{\bigfrown}\qquad\underset{\circ:z}{\bigcirc}\qquad\underset{\circ:\bigcirc}{\circ:\bigcirc}$$

We have <sup>g</sup> <sup>≈</sup> <sup>i</sup> <sup>≤</sup>sub <sup>φ</sup> <sup>h</sup> <sup>≈</sup> <sup>i</sup>, with <sup>φ</sup> = (<sup>x</sup> . = 0 <sup>∧</sup> <sup>y</sup> . <sup>=</sup> <sup>z</sup> + 1 <sup>∧</sup> <sup>0</sup> . = 0). Thus, if I only contains the standard model of Presburger arithmetic, then [<sup>g</sup> <sup>≈</sup> <sup>i</sup> <sup>|</sup> <sup>y</sup> ≈ 0] subsumes [h ≈ i | ].

152 R. Echahed et al.

The following lemma states the relation between the new notion of redundancy and I-redundancy (as defined in Definition 36).

**Lemma 40.** *Let* Γ *be a set of c-clauses. If* [C | φ] *is (strictly) redundant w.r.t.* Γ *then it is (strictly)* I*-redundant w.r.t.* Γ*.*

*Remark 41.* By the previous definitions, checking whether a given c-clause is (strictly) redundant involves testing the validity of entailments of the form φ |=<sup>I</sup> ∃*y*.ψ, which may be infeasible in practice (for instance the problem is undecidable if I contains all interpretations). Stronger conditions may be used instead, e.g., one may check whether there exists a substitution σ such that φ is of the form ψσ ∧ ψ, which is decidable.

## **6 Conclusion**

We devised a constrained superposition calculus to test the satisfiability of sets of clauses defined over graphs. Its soundness and refutational completeness was established, modulo a redundancy criterion that captures the usual deletion and simplification rules: subsumption, demodulation, deletion of clauses with trivial equations and – in the case of Horn clauses only – deletion of clauses containing complementary literals. The considered structures are rooted directed labeled graphs, which are general enough to capture most existing equational graph theories, such as those developed for quantum circuits. In contrast to [14], the calculus is able to handle disjunctions as well as interpreted labels, and in contrast to [22], our solution avoids any encoding of graphs into terms, by defining inference rules operating directly on graphs.

From a practical point of view, it would be interesting to get more general redundancy criteria, to reduce the branching factor and improve the efficiency of the procedure. In particular, is it possible to define a version of the calculus in which tautology deletion is allowed, even for non Horn clauses? As evidenced by Example 22, this would require to define a new equational factorization rule, allowing for non trivial superposition inferences within a single clause.

Another interesting issue is to add variables denoting not only labels, but also graphs. This would allow for instance to synthesize graphs satisfying some properties. As graphs can be viewed as functions with multiple inputs and outputs (denoted by the roots) such an addition would yield a second order logic.

Finally, it would be interesting to identify fragments for which the calculus terminates, ensuring decidability of the satisfiability problem. In contrast to terms, the calculus does not terminate (and the satisfiability problem is undecidable) for ground unit clauses [14], hence strong restrictions on the shape of the graphs are required to ensure termination.

## **References**

1. L. Bachmair and H. Ganzinger. Rewrite-based equational theorem proving with selection and simplification. *Journal of Logic and Computation*, 3(4):217–247, 1994.


volume 11716 of *Lecture Notes in Computer Science*, pages 495–507. Springer, 2019.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## A Programming Language Characterizing Quantum Polynomial Time

Emmanuel Hainry , Romain Péchoux , and Mário Silva(-)

Université de Lorraine, CNRS, Inria, LORIA, 54000 Nancy, France {hainry,pechoux,mmachado}@loria.fr

Abstract. We introduce a first-order quantum programming language, named foq, whose terminating programs are reversible. We restrict foq to a strict and tractable subset, named pfoq, of terminating programs with bounded width, that provides a first programming language-based characterization of the quantum complexity class fbqp. We finally present a tractable semantics-preserving algorithm compiling a pfoq program to a quantum circuit of size polynomial in the number of input qubits.

## 1 Introduction

Motivations. Quantum computing is an emerging and promising computational model that has been in the scientific limelight for several decades. This phenomenon is mainly due to the advantage of quantum computers over their classical competitors, based on the use of purely quantum properties such as superposition and entanglement. The most notable example being Shor's algorithm for finding the prime factors of an integer [15], which is exponentially faster than the most efficient known classical factoring algorithm and which is expected to have implications in cryptography (RSA encryption, etc.).

Whether due to the fragility of quantum systems, namely the engineering problem of maintaining a large number of qubits in a coherent state, or by lack of reliable technological alternatives, quantum computing is typically described at a level close to hardware. Without any hope of being exhaustive, one can think to quantum circuits [9,11], to measurement-based quantum computers [4,7] or to circuit description languages [13]. This low-level machinery restricts drastically the abstraction and programming ease offered by these models and quantum programs currently suffer from the comparison with their classical competitors, which have many high-level tools and formalisms based on more than 50 years of scientific research, engineering development, and practical and industrial applications.

In order to solve these issues, a major effort is made to realize the promise of a quantum computer, which requires the development of different layers of hardware and software, together referred to as the quantum stack. Our paper is part of this line of research. We focus on the highest layers of the quantum stack: quantum programming languages and quantum algorithms. We seek to better understand what can be done efficiently on a quantum computer and we are

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. https://doi.org/10.1007/978-3-031-30829-1\_8 156–175, 2023.

particularly interested in the development of quantum programming languages where program complexity can be certified automatically by some static analysis technique.

Contribution. Towards this end, we take the notion of polynomial time computation as our main object of study. Our contributions are the following.


Our programming language foq and the restriction to pfoq are illustrated throughout the paper, using the Quantum Fourier Transform QFT as a leading algorithm (Example 1).

*Related work.* This paper belongs to a long standing line of works trying to specify, understand, and analyze the semantics of quantum programming languages, starting with the cornerstone work of Selinger [14]. The motivations in restricting the considered programs to pfoq were inspired by the works on *implicit computational complexity*, that seek to characterize complexity classes by putting restrictions (type systems or others) on standard programming languages and paradigms [1,5,12]. These restrictions have to be implicit (*i.e.*, not provided by the programmer) and tractable. Among all these works, we are aware of two results [16] and [6] studying polynomial time computations on quantum programming languages, works from which our paper was greatly inspired. [6] provides a characterization of bqp based on a quantum lambda-calculus. Our work is an extension to fbqp with a restriction to first-order procedures. Last but not least, [6] is based on Yao's simulation of quantum Turing machines [17] while we provide an explicit algorithm for generating circuits of polynomial size. Our work is also inspired by the function algebra of [16], that characterizes fbqp: our completeness proof shows that any function in [16] can be simulated by a pfoq program (Theorem 6). However, we claim that foq is a more general language for fbqp in so far that it is much less constraining (in terms of expressive power) than the function algebra of [16]: any function of [16] can be, by design, transformed into a pfoq program, whereas the converse is not true. We can take as example the quantum Fourier transform (QFT) which, as noted in [16], cannot be exactly computed by the function algebra without an additional initial quantum function. Furthermore, the *multi-qubit recursion* construction described in [16] is more restrictive than what we allow in pfoq, since we may only call the same recursive function in each branch.

## 2 First-order quantum programming language

*Syntax and well-formedness.* We consider a quantum programming language, called foq for First-Order Quantum programming language, that includes basic data types such as Integers, Booleans, Qubits, Operators, and Sorted Sets of qubits, lists of finite length where all elements are different. A foq program has the ability to call first-order (recursive) procedures taking a sorted set of qubits as a parameter. Its syntax is provided in Figure 1.

Let x denote an integer variable and ¯p, ¯q denote sorted sets variables. The size of the sorted set stored in ¯q will be denoted by <sup>|</sup>¯q|. We can refer to the i-th qubit in ¯q as ¯q[i], with <sup>1</sup> <sup>≤</sup> <sup>i</sup> ≤ |¯q|. Hence, each non-empty sorted set variable ¯q can be viewed as a list [¯q[1],..., ¯q[|¯q|]]. The empty sorted set, of size <sup>0</sup>, will be denoted by nil and ¯q [i] will denote the sorted set obtained by removing the qubit of index i in ¯q. For notational convenience, we extend this notation by ¯q [i1,...,ik], for the list obtained by removing the qubits of indexes <sup>i</sup><sup>1</sup>,...,i<sup>k</sup> in the sorted set ¯q.

The language also includes some constructs U<sup>f</sup> to represent (unary) unitary operators, for some total function f <sup>∈</sup> <sup>Z</sup> <sup>→</sup> [0, <sup>2</sup>π)∩R˜. The function f is required to be polynomial-time approximable: its output is restricted to R˜, the set of real numbers that can be approximated by a Turing machine for any precision 2−<sup>k</sup> in time polynomial in k.


Fig. 1: Syntax of foq programs

A foq *program* P(¯q) consists of a sequence of *procedure declarations* D followed by a *program statement* S, ε denoting the empty sequence. In what follows, we will sometimes refer to program P(¯q) simply as P. Let var(S) be the set of variables appearing in the statement S. Let |P| be the size of program P, that is the total number of symbols in P.

A procedure declaration decl proc[x](¯p){S} takes a sorted set parameter ¯p and some optional integer parameter x as inputs. S is called the *procedure statement*, proc is the *procedure name* and belongs to a countable set Procedures. We will write <sup>S</sup>proc to refer to <sup>S</sup> and proc <sup>∈</sup> <sup>P</sup> holds if proc is declared in <sup>D</sup>.

Statements include a no-op instruction, applications of a unitary operator to a qubit (q ∗= U<sup>f</sup> (i);), sequences, (classical) conditionals, *quantum cases*, and *procedure calls* (call proc[i](s);). A quantum case qcase <sup>q</sup> of {<sup>0</sup> <sup>→</sup> <sup>S</sup>0, <sup>1</sup> <sup>→</sup> <sup>S</sup>1} provides a quantum control feature that will execute statements S<sup>0</sup> and S<sup>1</sup> in superposition. For example, the CNOT gate on qubits ¯q[i] and ¯q[j], for i, j <sup>∈</sup> <sup>N</sup>, i = j, can be simulated by the following statement:

CNOT(¯q[i], ¯q[j]) qcase ¯q[i] of {<sup>0</sup> <sup>→</sup> skip; , <sup>1</sup> <sup>→</sup> ¯q[j] <sup>∗</sup><sup>=</sup> NOT; }.

Throughout the paper, we restrict our study to *well-formed* programs, that is, programs P = D :: S satisfying the following properties: var(S) ⊆ {¯q}; ∀proc ∈ <sup>P</sup>, var(Sproc) ⊆ {x, ¯p}; procedure names declared in <sup>D</sup> are pairwise distinct; for each procedure call, the procedure name is declared in D.

*Semantics.* Let <sup>H</sup>2<sup>n</sup> be the *Hilbert space* <sup>C</sup>2<sup>n</sup> of n qubits. We use Dirac notation to denote a quantum state |ψ∈ H2<sup>n</sup> . Each |ψ∈ H2<sup>n</sup> can be written as a superposition of bitstrings of size n: |ψ = - <sup>w</sup>∈{0,1}<sup>n</sup> <sup>α</sup>w|w, with <sup>α</sup><sup>w</sup> <sup>∈</sup> <sup>C</sup> and - <sup>w</sup> |αw| <sup>2</sup> = 1. The *length* (|ψ) of the state <sup>|</sup>ψis <sup>n</sup>. Given two matrices M,N, we denote by M† the transpose conjugate of M and by M ⊗N the tensor product of M by N. ψ<sup>|</sup> is equal to <sup>|</sup>ψ † and <sup>|</sup>ψφ<sup>|</sup> and ψ|φ are respectively the inner product and outer product of <sup>|</sup>ψ and <sup>|</sup>φ. Let <sup>I</sup><sup>n</sup> be the identity matrix in <sup>C</sup><sup>n</sup>×<sup>n</sup>. Given <sup>m</sup> <sup>≤</sup> <sup>n</sup> and <sup>i</sup> ∈ {0, <sup>1</sup>}, define <sup>|</sup>i<sup>m</sup> - I<sup>2</sup>m−<sup>1</sup> ⊗ |i⊗ I<sup>2</sup>n−<sup>m</sup> and i| <sup>m</sup> -(|im)†.

$$\begin{aligned} \text{A function } \llbracket \mathbf{U}^f \rrbracket \in \mathbb{Z} &\to \mathring{\mathbb{C}}^{2 \times 2} \text{ is associated to each } \mathbf{U}^f \text{ as follows:}\\ \llbracket \mathbf{NOT} \rrbracket(n) &\triangleq \begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix}, \llbracket \mathbf{R}^f\_Y \rrbracket(n) \triangleq \begin{pmatrix} \cos(f(n)) - \sin(f(n))\\ \sin(f(n)) \cos(f(n)) \end{pmatrix}, \llbracket \mathbf{P} \mathbf{h}^f \rrbracket(n) \triangleq \begin{pmatrix} 1 & 0\\ 0 \, e^{if(n)} \end{pmatrix}, \end{aligned}$$

where C˜ is the set of complex numbers whose both real and imaginary parts are in <sup>R</sup>˜. One can check easily that each matrix M - -<sup>U</sup><sup>f</sup> (n) <sup>∈</sup> <sup>C</sup>˜ <sup>2</sup>×<sup>2</sup> is unitary, i.e., it satisfies M† <sup>M</sup> <sup>=</sup> M M† <sup>=</sup> <sup>I</sup><sup>2</sup>.

Let <sup>B</sup> to be the set of Boolean values b ∈ {false, true}. For a given set X, let <sup>L</sup>(X) be the set of lists of elements in X. Let l = [x<sup>1</sup>,...,x<sup>m</sup>], with <sup>x</sup><sup>1</sup>,...,x<sup>m</sup> <sup>∈</sup> <sup>X</sup>, denote a list of <sup>m</sup>-elements in <sup>L</sup>(X) and [ ] be the empty list (when m = 0). For l,l- ∈ L(X), l@l denotes the concatenation of l and l - . hd(l) and tl(l) represent the tail and the head of l, respectively. Lists of integers will be used to represent Sorted Sets. They contain pointers to qubits (i.e., indexes) in the global memory.

We interpret each basic data type τ as follows: -Integers - <sup>Z</sup>, -Booleans - B, -SortedSets - <sup>L</sup>(N), -Qubits - <sup>N</sup>, and -Operators - C˜ <sup>2</sup>×<sup>2</sup>. Each basic operation op ∈ {+, <sup>−</sup>, >, <sup>≥</sup>, <sup>=</sup>, <sup>∧</sup>, <sup>∨</sup>, ¬} of arity n, with <sup>1</sup> <sup>≤</sup> n <sup>≤</sup> <sup>2</sup>, has a type signature <sup>τ</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>τ</sup><sup>n</sup> <sup>→</sup> <sup>τ</sup> fixed by the program syntax. For example, the operation <sup>+</sup> has signature Integers×Integers <sup>→</sup> Integers. A total function op <sup>∈</sup> τ<sup>1</sup> <sup>×</sup> ... <sup>×</sup> τ<sup>n</sup> <sup>→</sup> τ is associated to each op.

For each basic type τ , the reduction ⇓<sup>τ</sup> is a map in <sup>τ</sup> × L(N) <sup>→</sup> τ . Intuitively, it maps an expression of type τ to its value in τ for a given list l of pointers in memory. These reductions are defined in Figure 2, where e and d denote either an integer expression i or a boolean expression b.

Note that in rule (Rm∈/), if we try to delete an undefined index then we return the empty list, and in rule (Qu∈/), if we try to access an undefined qubit index then we return the value 0 (defined indexes will always be positive). The standard gates <sup>R</sup><sup>Y</sup> (π/4), <sup>P</sup>(π/4), and CNOT, form a universal set of gates [3], which justifies the choice of NOT, R<sup>f</sup> <sup>Y</sup> (i), and Ph<sup>f</sup> (i) as basic operators. For instance, we can simulate the application of an Hadamard gate H on <sup>q</sup> by the following statement q ∗= R<sup>f</sup> <sup>Y</sup> (0); q <sup>∗</sup><sup>=</sup> NOT;, with the function <sup>f</sup> defined by <sup>∀</sup>n, f(n) = π/<sup>4</sup> <sup>∈</sup> [0, <sup>2</sup>π)∩R˜. By abuse of notation, we will sometimes use <sup>q</sup> <sup>∗</sup><sup>=</sup> <sup>H</sup>; to denote this statement. Using CNOT, we can also define the SWAP operation swapping the state between two qubits ¯q[i] and ¯q[j], with i, j <sup>∈</sup> <sup>N</sup>, i <sup>=</sup> <sup>j</sup>:

#### SWAP(¯q[i], ¯q[j]) -CNOT(¯q[i], ¯q[j]) CNOT(¯q[j], ¯q[i]) CNOT(¯q[i], ¯q[j]).

Let and ⊥ be two special symbols for termination and error, respectively, and let stand for a symbol in {, ⊥}. The set of configurations of dimension 2<sup>n</sup>, denoted Confn, is defined by

$$\text{Conf}\_n \overset{\Delta}{=} \text{(Statistics} \cup \{\top, \bot\}) \times \mathcal{H}\_{2^n} \times \mathcal{P}(\mathbb{N}) \times \mathcal{L}(\mathbb{N}),$$

with <sup>P</sup>(N) being the powerset over <sup>N</sup>. A configuration <sup>c</sup> = (S, <sup>|</sup>ψ, A, l) <sup>∈</sup> Conf<sup>n</sup> contains a statement <sup>S</sup> to be executed (provided that <sup>S</sup> ∈ { / , ⊥}), a quantum

(e, l) ⇓<sup>τ</sup>1 <sup>m</sup> (d, l) ⇓<sup>τ</sup>2 <sup>n</sup> (Op) (e op d, l) ⇓op(<sup>τ</sup>1,<sup>τ</sup>2) op(m, n) (i, l) ⇓<sup>Z</sup> <sup>n</sup> (Unit) (U<sup>f</sup> (i), l) ⇓<sup>C</sup>2×<sup>2</sup> -<sup>U</sup><sup>f</sup> (n) (Cst) (n, l) ⇓<sup>Z</sup> <sup>n</sup> (s, l) ⇓L(N) [x1,...,xm] (i, l) ⇓<sup>Z</sup> k ∈ [1, m] (Rm∈) (s [i], l) ⇓L(N) [x1,...,x<sup>k</sup>−<sup>1</sup>, xk+1,...,xm] (s, l) ⇓L(N) [x1,...,xn] (Size) (|s|, l) ⇓<sup>Z</sup> <sup>n</sup> (s, l) ⇓L(N) [x1,...,xm] (i, l) ⇓<sup>Z</sup> k /∈ [1, m] (Rm∈/) (s [i], l) ⇓L(N) [ ] (Nil) (nil, l) ⇓L(N) [ ] (s, l) ⇓L(N) [x1,...,xm] (i, l) ⇓<sup>Z</sup> k ∈ [1, m] (Qu∈) (s[i], l) ⇓<sup>N</sup> <sup>x</sup><sup>k</sup> (Var) (¯q, l) ⇓L(N) <sup>l</sup> (s, l) ⇓L(N) [x1,...,xm] (i, l) ⇓<sup>Z</sup> k /∈ [1, m] (Qu∈/) (s[i], l) ⇓<sup>N</sup> <sup>0</sup>

state |ψ of length n, a set A containing the indexes of qubits that are allowed to be accessed by statement S, and a list l of qubit pointers.

The program big-step semantics −→, described in Figure 3, is defined as a relation in - <sup>n</sup>∈<sup>N</sup> Conf<sup>n</sup> <sup>×</sup>Confn. In the rules of Figure 3, −→ is annotated by an integer, called *level*. For example, the level of the conclusion in the (Call[ ]) rule is 1. The level is used to count the total number of procedure calls that are not in superposition (*i.e.*, in distinct branches of a quantum case).

We now give a brief intuition on the rules of Figure 3. Rules (Asg⊥) and (Asg) evaluate the application of a unitary operator, corresponding to U<sup>f</sup> (j), to a qubit s[i]. For that purpose, they evaluate the index n of s[i] in the global memory. Rule (Asg⊥) deals with the error case, where the corresponding qubit is not allowed to be accessed. Rule (Asg) deals with the success case: the new quantum state is obtained by applying the result of tensoring the evaluation of U<sup>f</sup> (j) to the right index. Rules (Seq) and (Seq⊥) evaluate the sequence of statements, depending on whether an error occurs or not. The (If) rule deals with classical conditionals in a standard way. The three rules (Case), (Case⊥), and (Case∈/) evaluate the qubit index n of the control qubit s[i]. Then they check whether this index belongs to the set of accessible qubits (is n in A?). If so, the two statements S<sup>0</sup> and S<sup>1</sup> are intuitively evaluated in superposition, on the projected state 0| n|ψ and 1| n|ψ-, respectively. During these evaluations, the index n cannot be accessed anymore. The rule (Call[ ]) treats the base case of a procedure call when the sorted set parameter is empty. In the non-empty case, rule (Call) evaluates the sorted set parameter s to l and the integer parameter

<sup>x</sup> to <sup>n</sup>. It returns the result of evaluating the procedure statement <sup>S</sup>proc{n/x}, where n has been substituted to x, w.r.t. the updated qubit pointers list l - .

For a given program P = D :: S and a given quantum state |ψ-∈ H2*<sup>n</sup>* , the *initial configuration* for input <sup>|</sup>ψ is cinit(|ψ-) - (S, |ψ-, {1,...,n}, [1,...,n]) ∈ Confn. A program is *error-free* if there is no initial configuration <sup>c</sup>init(|ψ-) such that <sup>c</sup>init(|ψ-) −→ (⊥, |ψ- -, A, l). We write -<sup>P</sup>(|ψ-) = |ψ- -, whenever <sup>c</sup>init(|ψ-) <sup>m</sup> −→ (, |ψ- -, A, l) holds for some m. (, |ψ- -, A, l) is called a *terminal configuration*. Let H = - n <sup>H</sup>2<sup>n</sup> , a program *terminates* if -<sup>P</sup> is a total function in H→H. Note that if a program terminates then it is obviously error-free but the converse property does not hold. Every program P can be efficiently transformed into an error-free program P¬⊥ such that ∀|ψ-, if -<sup>P</sup>(|ψ-) is defined then -<sup>P</sup>(|ψ-) = -<sup>P</sup>¬⊥(|ψ-). For example, an assignment s[i] ∗= U<sup>f</sup> (j); can be transformed into the conditional statement if ((0 < i) ∧ (i ≤ |s|)) then s[i] ∗= U<sup>f</sup> (j); else skip;.

*Example 1.* A notable example of quantum algorithm is the Quantum Fourier Transform (QFT), used as a subroutine in Shor's algorithm [15], and whose quantum circuit is provided below, with <sup>R</sup>n - -Phλx.π/2x−<sup>1</sup> (n), for <sup>n</sup> <sup>≥</sup> <sup>2</sup>. After applying Hadamard and controlled <sup>R</sup>n gates, the circuit performs a permutation of qubits using swap gates.

Note that λx.π/2x−<sup>1</sup> is a total function in <sup>Z</sup> <sup>→</sup> [0, <sup>2</sup>π) <sup>∩</sup> <sup>R</sup>˜. Hence, it is polynomial time approximable. The above circuit can be simulated for any number of qubits <sup>|</sup>q<sup>|</sup> by the following foq program QFT.


call rec(¯q); call inv(¯q);

*Derivation tree and level.* Given a configuration <sup>c</sup> wrt a fixed program <sup>P</sup>, <sup>π</sup><sup>P</sup> <sup>c</sup> denotes the *derivation tree* of P, the tree of root c whose children are obtained by applying the rules of Figures 2 and 3 on configuration c with respect to P. We write <sup>π</sup> instead of <sup>π</sup><sup>P</sup> <sup>c</sup> when <sup>P</sup> and <sup>c</sup> are clear from the context. Note that a derivation tree π can be infinite in the particular case of a non-terminating computation. When π is finite, π π denotes that π is a subtree of π- .

In the case of a terminating computation <sup>π</sup> c, there exists a terminal configuration c and a level <sup>m</sup> <sup>∈</sup> <sup>N</sup> such that <sup>c</sup> <sup>m</sup> −→ c holds. In this case, the level of π is defined as lv<sup>π</sup> m. Given a foq program P that terminates, level<sup>P</sup> is a total function in <sup>N</sup> <sup>→</sup> <sup>N</sup> defined as levelP(n) max|ψ∈H2<sup>n</sup> lv<sup>π</sup>P<sup>c</sup>init(|ψ).

Intuitively, levelP(n) corresponds to the maximal number of non-superposed procedure calls in any program execution on an input of length n.

*Example 2.* Consider the program QFT of example 1. Assume temporarily that QFT terminates (this will be shown in Example 3). For all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, levelQFT(n) = (n+1)(n+2) <sup>2</sup> <sup>+</sup> <sup>n</sup> <sup>2</sup> + 1. Indeed, on sorted sets of size n, procedure rec is called recursively n + 1 times and makes n + 1 calls to procedure rot on sorted sets of size n, n − 1, ..., and 1. On sorted sets of size n, rot performs n recursive calls. Hence the total number of calls to rot is equal to n <sup>i</sup>=1 i. Finally, on a sorted set of size <sup>n</sup>, procedure inv does <sup>n</sup> <sup>2</sup> + 1 recursive call.

A program P is reversible if it terminates and there exists a program P−<sup>1</sup> such that P−<sup>1</sup> ◦ P <sup>=</sup> Id.

Theorem 1. *All terminating* foq *programs are reversible.*

## 3 Polynomial time soundness

In this section, we restrict the set of foq programs to a strict subset, named pfoq, that is sound for the quantum complexity class fbqp. For this, we define two criteria: a criterion ensuring that a program terminates and a criterion preventing a terminating program from having an exponential runtime.

*Polynomial-time* foq*.* Given two statements S, S- , we write S ∈ S to mean that S is a substatement of S and proc ∈ S holds if there are i and s such that call proc[i](s); ∈ S. Given a program P = D :: S, we define the relation <sup>&</sup>gt;P<sup>⊆</sup> Procedures <sup>×</sup> Procedures by proc<sup>1</sup> <sup>&</sup>gt;<sup>P</sup> proc<sup>2</sup> if proc<sup>2</sup> <sup>∈</sup> <sup>S</sup>proc<sup>1</sup> , for any two procedures proc1, proc<sup>2</sup> ∈ S. Let the partial order <sup>P</sup> be the transitive and reflexive closure of ><sup>P</sup> and define the equivalence relation ∼<sup>P</sup> by proc<sup>1</sup> ∼<sup>P</sup> proc<sup>2</sup> if proc<sup>1</sup> <sup>P</sup> proc<sup>2</sup> and proc<sup>2</sup> <sup>P</sup> proc<sup>1</sup> both hold. Define also the strict order P by proc<sup>1</sup> <sup>P</sup> proc<sup>2</sup> if proc<sup>1</sup> <sup>P</sup> proc<sup>2</sup> and proc<sup>1</sup> ∼<sup>P</sup> proc<sup>2</sup> both hold.

Definition 1. *Let* wf *be the set of* foq *programs* P *that are error-free and satisfy the well-foundedness constraint:* ∀proc ∈ P, ∀call proc- [i](s); <sup>∈</sup> <sup>S</sup>proc,

$$\text{proc} \sim\_{\mathbf{P}} \text{proc}' \Rightarrow \exists k > 0, \exists \mathbf{i}\_1, \dots, \mathbf{i}\_k, \ s = \bar{\mathbf{p}} \ominus [\mathbf{i}\_1, \dots, \mathbf{i}\_k].$$

Lemma 1 *If* <sup>P</sup> <sup>∈</sup> wf*, then* <sup>P</sup> *terminates.*

*Example 3.* Consider the program QFT of Example 1. The statements of the procedure declarations define the following relation: rec >QFT rec, rec >QFT rot, rot >QFT rot, and inv >QFT inv. Consequently, rec ∼QFT rec, rot ∼QFT rot, inv ∼QFT inv, and rec QFT rot hold. For each call to an equivalent procedure, we check that the argument decreases: ¯p[1] in rec, ¯p[2] in rot, and ¯p[1, <sup>|</sup>¯p|] in inv. Consequently, QFT <sup>∈</sup> wf. We deduce from Theorem <sup>1</sup> that QFT terminates.

We now add a further restriction on mutually recursive procedure calls for guaranteeing polynomial time using a notion of width.

Definition 2. *Given a program* P *and a procedure* proc ∈ P*, the* width *of* proc *in* P*, noted* widthP(proc)*, and the* width *of* proc *in* P relatively to statement S*, noted* wproc <sup>P</sup> (S)*, are two positive integers in* <sup>N</sup>*. They are defined inductively by:*

$$\begin{array}{l} \text{width}\_{\mathbf{P}}(\text{proc}) \triangleq w\_{\mathbf{P}}^{\text{proc}}(\text{S}^{\text{proc}}),\\ w\_{\mathbf{P}}^{\text{proc}}(\text{skip};) \triangleq 0,\\ w\_{\mathbf{P}}^{\text{proc}}(\mathbf{q} \bullet \mathbf{u}^{\text{f}}(\text{i}); \triangleq 0,\\ w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{1} \text{ S}\_{2}) \triangleq w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{1}) + w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{2}),\\ w\_{\mathbf{P}}^{\text{proc}}(\text{if b then S}\_{\text{true}} \text{ else } \text{S}\_{\text{false}}) \triangleq \max(w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{\text{true}}), w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{\text{false}})),\\ w\_{\mathbf{P}}^{\text{proc}}(\text{q case } \neq \text{of } \{0 \to \text{S}\_{0}, 1 \to \text{S}\_{1}\}) \triangleq \max(w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{0}), w\_{\mathbf{P}}^{\text{proc}}(\text{S}\_{1})),\\ w\_{\mathbf{P}}^{\text{proc}}(\text{call } \text{proc}'[\text{i}](\text{s}); \triangleq \begin{cases} 1 & \text{if } \text{proc } \sim \text{p} \text{ proc}',\\ 0 & \text{otherwise}. \end{cases} \end{array}$$

Definition 3 (PFOQ). *Let* pfoq *be the set of programs* P *in* wf *that satisfy the following constraint:* ∀proc ∈ P, widthP(proc) ≤ 1*.*

*Example 4.* In the program of Example 1, widthQFT(rec) = widthQFT(rot) = widthQFT(inv) = 1, since rec QFT rot holds. Since QFT <sup>∈</sup> wf, by Example 3, we conclude that QFT is a pfoq program.

We now show that the level of a pfoq program is bounded by a polynomial in the length of its input.

Lemma 2 *For each* pfoq *program* <sup>P</sup>*, there exists a polynomial* <sup>Q</sup> <sup>∈</sup> <sup>N</sup>[X] *such that* <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>, levelP(n) <sup>≤</sup> <sup>Q</sup>(n)*.*

Moreover, checking whether a program is pfoq is tractable.

Theorem 2. *For each* foq *program* <sup>P</sup>*, it can be decided in time* <sup>O</sup>(|P<sup>|</sup> <sup>2</sup>) *whether* <sup>P</sup>¬⊥ <sup>∈</sup> pfoq*.*

*Quantum Turing machines and FBQP.* Following Bernstein and Vazirani [2], a k-tape *Quantum Turing Machine* (QTM), with k ≥ 1, is defined by a triplet (Σ, Q, δ) where Σ is a finite alphabet including a blank symbol #, Q is a finite set of states with an initial state s<sup>0</sup> and a final state s = s0, and δ is the quantum transition function in <sup>Q</sup>×Σ<sup>k</sup> <sup>→</sup> <sup>C</sup>˜ <sup>Q</sup>×Σk×{L,N,R}<sup>k</sup> ; {L, N, R} being the set of possible movements of a head on a tape. Each tape of the QTM is twoway infinite and contains cells indexed by Z. A QTM successfully terminates if it reaches a superposition of only the final state s-. A QTM is said to be *well-formed* if the transition function δ preserves the norm of the superposition (or, equivalently, if the time evolution of the machine is unitary). The starting position of the tape heads is the *start cell*, the cell indexed by 0. If the machine terminates with all of its tape heads back on the start cells, it is called *stationary*. We will use *stationary* in the case where the machine terminates with its input tape head in the first cell, and all other tape heads in the last non-blank cell. We will further refer to a QTM as being *in normal form* if the only transitions from the final state s are towards the initial state s0. These will be important conditions for the composition and branching constructions of QTMs. If a QTM is well-formed, stationary, and in normal form, we will call it *conservative* [16] (N.B.: our notion of stationary QTM differs but can be shown to be equivalent to the definition of stationary QTM in [16]).

A configuration γ of a k-tape QTM is a tuple (s, w, n), where s is a state in Q, w is a k-tuple of words in Σ∗, and n is a k-tuple of indexes (head positions) in Z. An initial (final) configuration γinit (resp. γf in) is a configuration of the shape (s0, w, 0) (resp. (s-, w, 0)). We use γ(w) to denote a configuration γ where the word w is written on the input/output tape. Following [2], we write S to represent the inner-product space of finite complex linear combinations of configurations of the QTM M with the Euclidean norm. A QTM M defines a linear time operator - U<sup>M</sup> : S→S, that outputs a superposition of configurations <sup>i</sup> αi|γi obtained by applying a single-step transition of M to a configuration <sup>|</sup>γ(*i.e.*, <sup>U</sup>M|γ<sup>=</sup> - <sup>i</sup> <sup>α</sup>i|γi). Let <sup>U</sup><sup>t</sup> <sup>M</sup>, for t ≥ 1, be the t-steps transition obtained from U<sup>M</sup> as follows: U<sup>1</sup> <sup>M</sup> - U<sup>M</sup> and U<sup>t</sup>+1 <sup>M</sup> - <sup>U</sup><sup>M</sup> ◦ <sup>U</sup><sup>t</sup> <sup>M</sup>. Given a quantum state |ψ = - <sup>w</sup>∈{0,1}<sup>n</sup> <sup>α</sup>w|w and a configuration <sup>γ</sup>, let <sup>γ</sup>(|ψ) ∈ S be the quantum configuration defined by <sup>γ</sup>(|ψ) - - <sup>w</sup>∈{0,1}<sup>n</sup> <sup>α</sup>w|γ(w).

A quantum function f : H→H is computed by the QTM M in time t if for any <sup>|</sup>ψ∈ H, <sup>U</sup><sup>t</sup> <sup>M</sup>(γinit(|ψ)) = γf in(f(|ψ )). Given <sup>T</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> and a quantum function f, we say that the QTM M computes f in time T if for inputs of length n, M computes f in time T(n).

Definition 4. *Given two functions* <sup>f</sup> : {0, <sup>1</sup>} → {0, <sup>1</sup>}*,* <sup>F</sup> : H→H*, and a value* <sup>p</sup> <sup>∈</sup> [0, 1]*, we say that* <sup>f</sup> *is computed by* <sup>F</sup> *with probability* <sup>p</sup> *if* <sup>∀</sup><sup>x</sup> <sup>∈</sup> {0, <sup>1</sup>}, |f(x)|F(|x)<sup>|</sup> <sup>2</sup> <sup>≥</sup> <sup>p</sup>*.*

The class fbqp is the functional extension of the complexity class bqp.

Definition 5 ([2]). *A function* <sup>f</sup> ∈ {0, <sup>1</sup>} → {0, <sup>1</sup>} *is in* fbqp *iff there exist a QTM* <sup>M</sup> *and a polynomial* <sup>P</sup> <sup>∈</sup> <sup>N</sup>[X] *s.t.* <sup>M</sup> *computes* <sup>f</sup> *in time* <sup>P</sup> *with probability* <sup>2</sup> 3 *.*

A function <sup>f</sup> ∈ {0, <sup>1</sup>} → {0, <sup>1</sup>} has a *polynomial bound* <sup>P</sup> <sup>∈</sup> <sup>N</sup>[X] if <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>∀</sup><sup>x</sup> ∈ {0, <sup>1</sup>}<sup>n</sup>, <sup>∃</sup><sup>k</sup> <sup>≤</sup> <sup>P</sup>(n), f(x) ∈ {0, <sup>1</sup>}<sup>k</sup>. Functions in fbqp have a polynomial bound as the size of their output is smaller than the polynomial time bound.

*Soundness.* We show that QTMs can simulate the function computed by any terminating foq program. The time complexity of this simulation depends on the length of the input quantum state and on the level of the considered program.

Lemma 3 *For any terminating* foq *program* P*, there exists a conservative QTM* <sup>M</sup> *that computes* -<sup>P</sup> *in time* <sup>O</sup>(<sup>n</sup> <sup>+</sup> <sup>n</sup> <sup>×</sup> levelP(n))*.*

Now we show that any pfoq program computes a fbqp function.

Theorem 3. *Given a* pfoq *program* <sup>P</sup>*, a function* <sup>f</sup> : {0, <sup>1</sup>}- → {0, <sup>1</sup>}-*, and a value* <sup>p</sup> <sup>∈</sup> ( <sup>1</sup> <sup>2</sup> , 1]*. If* <sup>f</sup> *is computed by* -<sup>P</sup> *with probability* <sup>p</sup> *then* <sup>f</sup> <sup>∈</sup> fbqp*.*

*Proof.* Using Lemma 2 and Lemma 3.

## 4 FBQP completeness

In this section we show that any function in fbqp can be faithfully approximated by a pfoq program. Toward this end, we show that Yamakami's [16] fbqpcomplete function algebra can be exactly simulated in pfoq.

*Yamakami's function algebra.* A characterization of fbqp was provided in [16] using a function algebra, named - -QP <sup>1</sup> . Given a quantum state |ψ and a word <sup>w</sup> ∈ {0, <sup>1</sup>}<sup>n</sup>, with <sup>n</sup> <sup>≤</sup> <sup>l</sup>(|ψ). <sup>|</sup>ψ can be written as <sup>|</sup>ψ <sup>=</sup> <sup>i</sup> αi|wizi, with <sup>w</sup><sup>i</sup> ∈ {0, <sup>1</sup>}<sup>n</sup> and <sup>z</sup><sup>i</sup> ∈ {0, <sup>1</sup>}<sup>l</sup>(|ψ-)−<sup>n</sup>. We write w|ψ as an abuse of notation for the quantum state defined by w|ψ <sup>i</sup> α<sup>i</sup> w|wi |zi.

Definition 6. - -QP <sup>1</sup> *is the smallest class of functions including the basic initial functions* {I,Phθ, Rotθ, NOT, SW AP}*, with* <sup>θ</sup> <sup>∈</sup> [0, <sup>2</sup>π) <sup>∩</sup> <sup>C</sup>˜*,*

$$\begin{array}{l} -\ \operatorname{I}(|\psi\rangle) \stackrel{\scriptstyle \Delta}{=} |\psi\rangle \\ -\ \operatorname{Ph}\_{\theta}(|\psi\rangle) \stackrel{\scriptstyle \Delta}{=} |0\rangle \langle 0|\psi\rangle + e^{\mathrm{i}\theta} |1\rangle \langle 1|\psi\rangle \\ -\ \operatorname{Rot}\_{\theta}(|\psi\rangle) \stackrel{\scriptstyle \Delta}{=} \cos\theta |\psi\rangle + \sin\theta (|1\rangle \langle 0|\psi\rangle - |0\rangle \langle 1|\psi\rangle) \\ -\ \operatorname{NOT}(|\psi\rangle) \stackrel{\scriptstyle \Delta}{=} |0\rangle (|1\rangle \psi) + |1\rangle \langle 0|\psi\rangle \\ -\ \operatorname{SWAP}(|\psi\rangle) \stackrel{\scriptstyle \Delta}{=} \begin{cases} |\psi\rangle & \mbox{if } l(|\psi\rangle) \le 1 \\ \sum\_{a,b \in \{0,1\}} |ba\rangle \langle ab|\psi\rangle & \mbox{otherwise} \end{cases} \end{array}$$

*and closed under schemes* Comp*,* Branch*, and* kQRect*, for* k, t <sup>∈</sup> <sup>N</sup>*,*

$$\begin{aligned} & -\operatorname{Comp}[F,G](|\psi\rangle) \triangleq F(G(|\psi\rangle)) \\ & -\operatorname{Branch}[F,G](|\psi\rangle) \triangleq \begin{cases} |\psi\rangle & \text{if } l(|\psi\rangle) \le 1 \\ |0\rangle \otimes F(\langle 0|\psi\rangle) + |1\rangle \otimes G(\langle 1|\psi\rangle) & \text{otherwise} \end{cases} \\ & -kQ\operatorname{Rec}\_{t}[F,G,H](|\psi\rangle) \triangleq \begin{cases} F(|\psi\rangle) & \text{if } l(|\psi\rangle) \le t \\ G\left(\sum\_{w \in \{0,1\}^{k}} |w\rangle \otimes F\_{w}(\langle w|H(|\psi\rangle))\right) & \text{otherwise} \end{cases} \\ & \text{where each } F\_{w} \in \{kQ\operatorname{Rec}\_{t}[F,G,H],I\}. \end{aligned}$$

To handle general fbqp functions, [16] defines the extended encoding of an input <sup>x</sup> ∈ {0, <sup>1</sup>} as <sup>φ</sup><sup>P</sup> (|x) - <sup>|</sup>0<sup>l</sup>(|x-) <sup>1</sup>|0<sup>P</sup> (l(|x-))10<sup>11</sup><sup>P</sup> (l(|x-))+61|x, for some polynomial <sup>P</sup> <sup>∈</sup> <sup>N</sup>[X] that is an upper bound on the output size of the desired fbqp function. φ<sup>P</sup> simply consists in the quantum state |x preceded by a polynomial number of ancilla qubits. These ancilla provide space for internal computations and account for the polynomial bound associated to polynomial time QTMs.

Theorem 4 ([16]). *Given* <sup>f</sup> : {0, <sup>1</sup>}- → {0, <sup>1</sup>} *with polynomial bound* P ∈ N[X]*, the following statements are equivalent.*


We show the following result by structural induction on a function in -QP <sup>1</sup> .

Theorem 5. *Let* F *be a function in* -QP <sup>1</sup> *. Then there exists a* pfoq *program* P *such that* -<sup>P</sup> <sup>=</sup> <sup>F</sup>*.*

We are now ready to state the completeness result.

Theorem 6. *For every function* <sup>f</sup> *in* fbqp *with polynomial bound* <sup>Q</sup> <sup>∈</sup> <sup>N</sup>[X]*, there is a* pfoq *program* <sup>P</sup> *such that* -<sup>P</sup> ◦ <sup>φ</sup><sup>Q</sup> *computes* <sup>f</sup> *with probability* <sup>2</sup> 3 *.*

*Proof.* By Theorem 4 and Theorem 5.

## 5 Compilation to polynomial-size quantum circuits

In this section, we provide an algorithm that compiles a pfoq program on a given input length <sup>n</sup> <sup>∈</sup> <sup>N</sup> into a quantum circuit of size polynomial in <sup>n</sup>.

*Quantum circuits* [8] are a well-known graphical computational model for describing quantum computations. Qubits are represented by wires. Each unitary transformation U acting on n qubits can be represented as a gate U with n inputs and n outputs. A circuit C is an element of a PROP category ([10], a symmetric strict monoidal category) whose morphisms are generated by gates G and wires. Let <sup>1</sup> be the identity circuit (for any length) and ◦ and <sup>⊗</sup> be the composition and product, respectively. By abuse of notation, given k circuits C<sup>1</sup>,...,C<sup>k</sup>, ◦k <sup>i</sup>=1C<sup>i</sup> will denote the circuit <sup>C</sup>˜<sup>1</sup> ◦···◦ <sup>C</sup>˜<sup>k</sup>, where each circuit <sup>C</sup>˜<sup>i</sup> is obtained by tensoring C<sup>i</sup> appropriately with identities so that the output of C<sup>i</sup> matches the input of C<sup>i</sup>+1. By construction, a circuit is acyclic. Each circuit C<sup>n</sup> can be indexed by its number <sup>n</sup> <sup>∈</sup> <sup>N</sup> of input wires (i.e., non ancilla qubits) and computes a function -<sup>C</sup><sup>n</sup> ∈ H2<sup>n</sup> → H2<sup>n</sup> . To deal with functions in H→H, we consider families of circuits (Cn)<sup>n</sup>∈<sup>N</sup>, that are sequences of circuits such that each C<sup>n</sup> encodes computation on quantum states of length n. Hence each circuit has n input qubits plus some extra ancilla qubits. These ancillas can be used to perform intermediate computations but also to represent functions whose output size is strictly greater than their input size. To avoid the consideration of families encoding undecidable properties, we put a uniformity restriction.

Definition 7. *A family of circuits* (Cn)<sup>n</sup>∈<sup>N</sup> *is said to be* uniform *if there exists a polynomial time Turing machine that takes* n *as input and outputs a representation of* <sup>C</sup>n*, for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.*

In quantifying the complexity of a circuit, it is necessary to specify the considered elementary gates, and define the complexity of an operation as the number of elementary gates needed to perform it. In our setting, we consider the following set of universal elementary gates {R<sup>Y</sup> (π/4), P(π/4), CNOT}. The size #<sup>C</sup> of a circuit C is equal to the number of its gates and wires.

Definition 8. *A family of circuits* (Cn)<sup>n</sup>∈<sup>N</sup> *is said to be* polynomial-size *with* <sup>α</sup> <sup>∈</sup> <sup>N</sup> <sup>→</sup> <sup>N</sup> *ancilla qubits if there exists a polynomial* <sup>P</sup> <sup>∈</sup> <sup>N</sup>[X] *such that, for each* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* #C<sup>n</sup> <sup>≤</sup> <sup>P</sup>(n) *and the number of ancilla qubits in* <sup>C</sup><sup>n</sup> *is exactly* <sup>α</sup>(n)*.*

Let <sup>χ</sup><sup>m</sup> : <sup>H</sup>2<sup>n</sup> → H2n+<sup>m</sup> be defined by <sup>χ</sup>m(|ψ) - <sup>|</sup>ψ⊗ |0<sup>m</sup>, for a state | - <sup>ψ</sup> of size <sup>n</sup>. Let <sup>ξ</sup><sup>m</sup> : <sup>H</sup>2<sup>n</sup> → H2<sup>m</sup>, with <sup>m</sup> <sup>≤</sup> <sup>n</sup>, be defined by <sup>ξ</sup>m(|ψ) - w∈{0,1}<sup>m</sup> - <sup>z</sup>∈{0,1}n−<sup>m</sup> wz|ψ |w. Finally, let <sup>|</sup>w|, for <sup>w</sup> ∈ {0, <sup>1</sup>}-, be the size of the word w.

Theorem 7. (Adapted from [17] and [11]) *A function* <sup>f</sup> : {0, <sup>1</sup>}- → {0, <sup>1</sup>}- *is in* fbqp *iff there exists a uniform polynomial-size family of circuits* (Cn)<sup>n</sup>∈<sup>N</sup> *with* <sup>α</sup> *ancilla qubits s.t.* <sup>∀</sup><sup>x</sup> ∈ {0, <sup>1</sup>}-*,* f(x) ξ|f(x)<sup>|</sup> ◦ -<sup>C</sup>|x<sup>|</sup> ◦ <sup>χ</sup>α(|x|)(|x) 2 <sup>≥</sup> <sup>2</sup> 3 *.*

In Theorem 7, -<sup>C</sup>|x<sup>|</sup> is a function in <sup>H</sup><sup>2</sup>|x|+α(|x|) → H<sup>2</sup>|x|+α(|x|) The function <sup>χ</sup>α(|x|) pads the input with ancilla in state <sup>|</sup>0 to match the circuit dimension. The function <sup>ξ</sup>|f(x)<sup>|</sup> projects the output of the circuit to match the length of the function output <sup>|</sup>f(x)|. Hence, for <sup>|</sup>x∈ H<sup>2</sup>|x<sup>|</sup> , <sup>ξ</sup>|f(x)<sup>|</sup> ◦ -<sup>C</sup>|x<sup>|</sup> ◦ <sup>χ</sup>α(|x|)(|x) <sup>∈</sup> H<sup>2</sup>|f(x)<sup>|</sup> .

*Compilation to circuits.* For each pfoq program P, the existence of a polynomialsize uniform family of circuits (Cn)<sup>n</sup>∈<sup>N</sup> that computes -<sup>P</sup> is entailed by the combination of Lemma 2 and Theorem 7. However, due to the complex machinery of QTM, the constructions of both proofs cannot be used in practice to generate a circuit. In this section, we exhibit an algorithm that compiles directly a pfoq program to a polynomial-size circuit. Note that this compilation process requires some care since recursive procedure calls in quantum cases may yield an exponential number of calls. The remainder of this section will be devoted to presenting an algorithm, named compile, which, for a given pfoq program P and a given integer n produces a circuit C<sup>n</sup> such that ∀|ψ∈ H2<sup>n</sup> , -<sup>P</sup>(|ψ) = <sup>ξ</sup><sup>n</sup> ◦ -<sup>C</sup><sup>n</sup> ◦ <sup>χ</sup>α(n)(|ψ).

The compile algorithm uses two subroutines, named compr and optimize, and is defined by compile(P, n) compr(P, [1,...,n], ·).

The subroutine compr (Algorithm 1) generates the circuit inductively on the program statement. It takes as inputs: a program P, a list of qubit pointers l, and a control structure cs. A *control structure* cs is a partial function in <sup>N</sup> → {0, <sup>1</sup>}, mapping a qubit pointer to a control value (of a quantum case). Let · be the control structure of empty domain. For <sup>n</sup> <sup>∈</sup> <sup>N</sup> and <sup>k</sup> ∈ {0, <sup>1</sup>}, cs[<sup>n</sup> := <sup>k</sup>] is the control structure obtained from cs by setting cs(n) <sup>k</sup>. For a given <sup>x</sup> ∈ {0, <sup>1</sup>}-, we say that state <sup>|</sup>x *satisfies* cs if, <sup>∀</sup><sup>n</sup> <sup>∈</sup> dom(cs), cs(n) = <sup>k</sup> ⇒ |k<sup>|</sup> <sup>n</sup>|x|<sup>2</sup> = 1. Two control structures cs and cs are *orthogonal* if there does not exist a state <sup>|</sup>x that satisfies cs and cs- . Note that if <sup>∃</sup><sup>i</sup> <sup>∈</sup> dom(cs) <sup>∩</sup> dom(cs- ), cs(i) + cs- (i)=1 then cs and csare orthogonal.

Algorithm 1 (compr) Input: (P, l, cs) <sup>∈</sup> Programs × L(N) <sup>×</sup> (<sup>N</sup> → {0, <sup>1</sup>}) Let D :: S = P in if S = skip; then <sup>C</sup> <sup>←</sup> <sup>1</sup> - Identity circuit else if S=s[i] <sup>∗</sup><sup>=</sup> <sup>U</sup><sup>f</sup> (j); and (s[i], l) ⇓<sup>N</sup> <sup>n</sup> and (U<sup>f</sup> (j), l) ⇓<sup>C</sup>2×<sup>2</sup> <sup>M</sup> then C ← M(cs, [n]) - Controlled gate else if S=S<sup>1</sup> S<sup>2</sup> then <sup>C</sup> <sup>←</sup> compr(D :: S1, l, cs) ◦ compr(D :: S2, l, cs) - Composition else if S = if <sup>b</sup> then <sup>S</sup>true else <sup>S</sup>false and (b, l) ⇓<sup>B</sup> <sup>b</sup> then <sup>C</sup> <sup>←</sup> compr(D :: Sb, l, cs) - Conditional else if S = qcase <sup>s</sup>[i] of {<sup>0</sup> <sup>→</sup> <sup>S</sup>0, <sup>1</sup> <sup>→</sup> <sup>S</sup>1} and (s[i], l) ⇓<sup>N</sup> <sup>n</sup> then <sup>C</sup> <sup>←</sup> compr(D :: S0, l, cs[<sup>n</sup> := 0]) ◦ compr(D :: S1, l, cs[<sup>n</sup> := 1]) - Quantum case else if S = call proc[i](s) and (s, l) ⇓L(N) [ ] then <sup>C</sup> <sup>←</sup> <sup>1</sup> - Nil call else if S = call proc[i](s) and (s, l) ⇓L(N) <sup>l</sup> - =[] and (i, l) ⇓<sup>Z</sup> <sup>n</sup> then if widthP(proc) = 0 then <sup>C</sup> <sup>←</sup> compr(D :: Sproc{n/x}, l- , cs) - Non-recursive call else if widthP(proc) = 1 then <sup>C</sup> <sup>←</sup> optimize(D, [(cs, <sup>S</sup>proc{n/x})], proc, l- , {}) - Recursive call end if end if return C

Given a control structure cs and a statement S, a *controlled statement* is a pair (cs, S) <sup>∈</sup> Cst - (<sup>N</sup> → {0, <sup>1</sup>}) <sup>×</sup> Statements. Intuitively, a controlled statement (cs, S) denotes a statement controlled by the qubits whose indices are in dom(cs). For a unitary gate <sup>U</sup> ∈ H2<sup>n</sup> → H2<sup>n</sup> , a control structure cs, and a list of pointers <sup>l</sup> = [x1,...,xn] ∈ L(N) such that {x1,...,xn} ∩ dom(cs) = <sup>∅</sup>, U(cs, l) denotes the circuit applying gate U on qubits ¯q[x1],..., ¯q[xn], whenever <sup>∀</sup><sup>m</sup> <sup>∈</sup> dom(cs), ¯q[m] is in state <sup>|</sup>cs(m). As demonstrated in [11], this circuit can be built with O(card(dom(cs))) elementary gates and ancillas, and a single controlled-U gate.

Fig. 4: Example of circuit U(cs, l)

*Example 5.* As an illustrative example, consider a binary gate U and a control structure cs such that dom(cs) = {1, 2, 3}, cs(1) = cs(2) = 1, and cs(3) = 0. Also consider a list <sup>l</sup> = [4, 5] ∈ L(N). The circuit <sup>U</sup>(cs, l) is provided in Figure 4.

Similarly, we can define a generalized Toffoli gate as a circuit of the shape NOT(cs, n). Since card(dom(cs)) will not scale with the size of the input, such a circuit has a constant cost in gates and ancillas and can thus be considered as an elementary gate. We will also be interested in rearranging wires under a given control structure. For two lists of qubit pointers l<sup>1</sup> = [x1,...,xn], l<sup>2</sup> = [x- 1,...,x- <sup>n</sup>] ∈ L(N), define SW AP(cs, l1, l2) as the circuit that swaps the wires in l<sup>1</sup> with wires in l2, controlled on cs. This circuit needs in the worst case one ancilla and O(n) controlled SW AP gates (also known as Fredkin gates).

Let <sup>D</sup> - <sup>D</sup>(Procedures <sup>×</sup> <sup>Z</sup> <sup>×</sup> <sup>N</sup> <sup>→</sup> <sup>N</sup> × L(N)) be the set of dictionaries mapping keys of the shape (proc, i, j) to pairs of the shape (a, l), where i is the value of a classical parameter, j is the size of a sorted set, and a is a qubit index. We will denote the empty dictionary by {}. Let also a ← new ancilla() be an instruction that sets a to a fresh qubit index.

The subroutine optimize (Algorithm 2) treats the complex cases where circuit optimizations (merging) are needed, that is for recursive procedure calls. It takes as input a sequence of procedure declarations D, a list of controlled statements lCst, a procedure name proc, a list of qubit pointers l, and a dictionary Anc. The subroutine iterates on list lCst of controlled statements, indicating the statements left to be treated together with their control qubits. When recursive procedure calls appear in distinct branches of a quantum case, the algorithm merges these calls together. For that purpose, it uses new ancilla qubits as control qubits. Given procedure calls of shape call proc[i](s);, with respect to a given list <sup>l</sup> ∈ L(N), such that (i, l) ⇓<sup>Z</sup> <sup>i</sup>, (s, l) ⇓L(N) <sup>l</sup> - , and (|s|, l) ⇓<sup>N</sup> j. If the key (proc, i, j) already exists in the dictionary Anc, the associated ancilla is re-used, otherwise, Anc[proc, i, j] is set to (a, l- ). We can assume w.l.o.g. that the statement controlled on the ancilla can be treated only after all the re-uses of the ancilla. This can be done without increasing the total complexity of optimize.

Some extra ancillas e are also created for swapping wires and are not explicitly indexed since they are not revisited by the subroutine, and are just considered unique. Ancillas a and e are indexed and treated as input qubits, therefore they can be part of the domain of control structures.

```
Algorithm 2 (optimize) Build circuit for recursive procedure proc
Inputs: (D, lCst, proc,l, Anc) ∈ Decl × L(Cst) × Procedures × L(N) × D
```

```
CL ← 1; CR ← 1; P ← D :: skip;
while lCst =[] do
   (cs, S) ← hd(lCst); lCst ← tl(lCst)
   if S=S1 S2 then
       if wproc
           P (S1)=1 then
          lCst ← lCst@[(cs, S1)]; CR ← compr(D :: S2, l, cs) ◦ CR
       else
          lCst ← lCst@[(cs, S2)]; CL ← CL ◦ compr(D :: S1, l, cs)
       end if
   end if
   if S = if b then Strue else Sfalse and (b, l) ⇓B b then
       if wproc
           P (Sb)=1 then
          lCst ← lCst@[(cs, Sb)]
       else
          CL ← CL ◦ compr(D :: Sb, l, cs)
       end if
   end if
   if S = qcase s[i] of {0 → S0, 1 → S1} and (s[i], l) ⇓N n then
       if wproc
           P (S0)=1 and wproc
                               P (S1)=1 then
          lCst ← lCst@[(cs[n := 0], S0),(cs[n := 1], S1)]
       else if wproc
                P (S1)=0 then
          lCst ← lCst@[(cs[n := 0], S0)];
          CR ← compr(D :: S1, l, cs[n := 1]) ◦ CR
       else if wproc
                P (S0)=0 then
          lCst ← lCst@[(cs[n := 1], S1)];
          CR ← compr(D :: S0, l, cs[n := 0]) ◦ CR
       end if
   end if
   if S = call proc-

                    [i](s) and (s, l) ⇓L(N) l
                                            -
                                             =[] and (i, l) ⇓Z n then
       if (proc-

               , n, |l
                    -

                     |) ∈ Anc then
          Let (a, l-
                    -
                    ) = Anc[proc-

                                  , n, |l
                                       -

                                        |] in
          e ← new ancilla();
          CL ← CL ◦ NOT(cs, e) ◦ NOT(·[e = 1], a) ◦ SW AP(·[e = 1], l-

                                                                           , l-
                                                                             -
                                                                              );
          CR ← SW AP(·[e = 1], l-
                                    -
                                    , l-

                                       ) ◦ NOT(·[e = 1], a) ◦ NOT(cs, e) ◦ CR
       else
          a ← new ancilla()
          Anc[proc-

                    , n, |l
                         -

                          |] ← (a, l-

                                    );
          CL ← CL ◦ NOT(cs, a); CR ← NOT(cs, a) ◦ CR;
          lCst ← lCst@[(·[a = 1], Sproc-

                                       {n/x})]
       end if
   end if
end while
return CL ◦ CR
```
Theorem 8. *For any* <sup>P</sup> *in* pfoq*, there is* <sup>Q</sup> <sup>∈</sup> <sup>N</sup>[X]*,* <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* ∀|ψ∈ H2<sup>n</sup> , -<sup>P</sup>(|ψ) = <sup>ξ</sup><sup>n</sup> ◦ compile(P, n) ◦ <sup>χ</sup>α(n)(|ψ) *and* #compile(P, n) <sup>≤</sup> <sup>Q</sup>(n)*.*

*Example 6.* compile(QFT, n) outputs the circuit provided in Example 1. Notice that there is no extra ancilla as no procedure call appears in the branch of a quantum case.

*Polynomial-size circuits.* We show Theorem 8 by exhibiting that any exponential growth of the circuit can be avoided by the compile algorithm using an argument based on orthogonal control structures. With a linear number of gates and a constant number of extra ancillas, we can merge calls referring to the same procedure, on different branches of a quantum case, when they are applied to sorted sets of equal size. An example of the construction is given in Figure 5 where two instances of a gate U are merged into one using SW AP gates and gates controlled by orthogonal control structures.

Fig. 5: Example of circuit optimization.

The following proposition shows that multiple uses of a gate can be merged in one provided they are applied to orthogonal control structures.

Lemma 4 *For any circuit* C<sup>n</sup> - ◦<sup>k</sup> <sup>i</sup>=1U(csi, li)*, with a unitary gate* U*, pairwise orthogonal* cs1, . . . , cs<sup>k</sup> <sup>∈</sup> Cst*, and* <sup>l</sup>1,...l<sup>k</sup> ∈ L(N)*, there exists a circuit* <sup>C</sup> *using one controlled gate* <sup>U</sup>*,* <sup>O</sup>(kn) *gates, and* <sup>O</sup>(k) *ancillas, and such that* -<sup>C</sup> <sup>=</sup> -<sup>C</sup><sup>n</sup>*.*

Now we show that orthogonality is an invariant property of compile.

Lemma 5 *Orthogonality is an invariant property of the control structures in* lCst *of the subroutine* optimize*. In other words, for any two distinct pairs* (cs, S)*,* (cs- , S- ) *in* lCst*,* cs *and* cs*are orthogonal.*

Theorem 9. *For any* P *in* pfoq*,* compile(P, n) *runs in time* O(n<sup>2</sup>|P|+1)*.*

*Proof.* Using Lemma 4 and Lemma 5.

As there is no circuit duplication in the assignments of compile, we can deduce from Theorem 9 that the compiled circuit is of polynomial size.

Corollary 1. *For any* <sup>P</sup> *in* pfoq*, there exists a polynomial* <sup>Q</sup> <sup>∈</sup> <sup>N</sup>[X] *such that* #compile(P, n) ≤ Q(n)*.*

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## On the Existential Arithmetics with Addition and Bitwise Minimum

Mikhail R. Starchak( -)

St. Petersburg State University, St. Petersburg, Russia m.starchak@spbu.ru

Abstract. This paper presents a similar approach for existential firstorder characterizations of the languages recognizable by finite automata, by Parikh automata, and by multi-counter machines over the alphabet {0, <sup>1</sup>, ..., k <sup>−</sup> <sup>1</sup>}*<sup>n</sup>* for some <sup>k</sup> <sup>≥</sup> <sup>2</sup>. The set of <sup>k</sup>-FA-recognizable relations coincides with the set of relations, which are existentially definable in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &*k*, <sup>=</sup>, where &*<sup>k</sup>* corresponds to the bitwise minimum of base k. In order to obtain an existential first-order description of k-Parikh automata languages, we extend this structure with the predicate EqNZB*k*(x, y) which is true if and only if x and y have the same number of non-zero bits in k-ary encoding. Using essentially the same ideas, we encode computations of k-multi-counter machines and thus show that every recursively enumerable relation over the natural numbers is existentially definable in the aforementioned structure supplemented with concatenation z = x *<sup>k</sup>* y z = x + k*<sup>l</sup>*k(*x*) y, where l*k*(x) is the bit-length of x in base k. This result gives us another proof of DPR-theorem.

Keywords: Bitwise minimum · Büchi arithmetic · Parikh automata · Existential definability · Recursively enumerable sets · DPR-theorem · Concatenation

## 1 Introduction

In a recent paper [11], Haase and Różycki considered definability problems in k-Büchi arithmetic, an extension of Presburger arithmetic with a relation V<sup>k</sup> such that Vk(x, y) if and only if x is the largest power of k that divides y. They proved that there are relations which are definable in k-Büchi arithmetic (k-definable) and not definable by any existential formula of the corresponding language. By a slight modification of a theorem of Villemaire [24, Corollary 2.4], they show that every <sup>k</sup>-definable relation can actually be expressed via some ∃∀-formula, whereas Villemaire constructs a ∃∀∃-formula.

Büchi arithmetic of base <sup>k</sup> <sup>≥</sup> <sup>2</sup> can be considered as a first-order characterization of the languages, recognizable by finite-state automata over the alphabet {0, <sup>1</sup>, ..., k <sup>−</sup> <sup>1</sup>} <sup>n</sup> (called k-FA-recognizable). Interpreting the words of this language as tuples (x1, ..., xn) of natural numbers in base k encoding, we obtain the Büchi-Bruyère theorem [3,5], which states that every relation <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> is <sup>k</sup>-FArecognizable if and only if it is k-definable. A second-order version of this theorem (which was proved independently by Büchi [5], Elgot [9], and Trakhtenbrot [22]) says that every relation is 2-FA-recognizable iff it is weak monadic second-order (WMSO-)definable in the structure -<sup>N</sup>; S, where S is a unary function symbol for the successor function over the natural numbers. The WMSO-theory of -<sup>N</sup>; S is usually denoted by WS1S.

Coming back to the Villemaire's result, we see that his encoding of k-FA via ∃∀∃-formulas of the language of k-Büchi arithmetic uses a unique bounded universal quantifier. A similar construction often appears in logical descriptions of abstract machines. For example, Klaedtke and Rueß considered in [16] various definability and decidability properties for WMSO-formulas with successor S and cardinality constraints of the form <sup>|</sup>X<sup>1</sup><sup>|</sup> <sup>+</sup> ... <sup>+</sup> <sup>|</sup>X<sup>r</sup><sup>|</sup> <sup>&</sup>lt; <sup>|</sup>Y<sup>1</sup><sup>|</sup> <sup>+</sup> ... <sup>+</sup> <sup>|</sup>Y<sup>s</sup>|; the corresponding WMSO-theory of N was denoted by WS1Scard. They introduced Parikh automata, an extension of finite automata, and obtained an analogue of Büchi's Theorem, namely every relation recognizable by a Parikh automaton over the alphabet {0, <sup>1</sup>} <sup>n</sup> is existentially WMSO-definable in <sup>N</sup> with S and cardinality constraints, and vice versa. Here, only second-order variables are existentially quantified, while the formula, which describes a computation of a given Parikh automaton, still contains a universally quantified first-order variable (see [16, Theorem 10], where the universal quantifier <sup>∀</sup>x can be bounded by the maximal element of the existentially quantified second-order variable U).

Note that while WS1S is decidable, WS1Scard is already undecidable, and its decidable fragments [16, Theorem 16] were obtained as a consequence of decidability of the emptiness problem for Parikh automata. Translating these undecidability results into first-order context, Bès showed [2, Proposition 3.8] in particular that the graph of multiplication function is definable in the structure -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, V<sup>2</sup>,EqNonZeroBits, <sup>=</sup>, where EqNonZeroBits(x, y) is true iff <sup>x</sup> and y have the same number of non-zero bits in their binary representations. This implies undecidability of the first-order theory of this structure, but it is not known, for example, whether the existential first-order theory is decidable. In the concluding section [2], Bès remarks that *"it would be interesting to study the expressive power of fragments of FO arithmetic which include predicates like* EqNonZeroBits*"*. We will further shorten the name of this predicate to EqNZB.

The Davis-Putnam-Robinson theorem (DPR-theorem) [8] was a milestone in the undecidability proof of the Hilbert's Tenth Problem. This theorem states that every relation R <sup>⊆</sup> <sup>N</sup><sup>n</sup> is recursively enumerable (r.e.) if and only if it is existentially first-order definable in the structure -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, ·, exp, <sup>=</sup> (these relations are also called *exponential diophantine*). As the starting point, the proof uses the result of Davis [7], which states that every r.e. set is ∃∀∃-definable in the structure -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, ·, <sup>=</sup> with one bounded universal quantifier. It is important for us that elimination of this quantifier in the proof of DPR-theorem involves multiplication, factorial, binomial coefficients, and does not seem useful when we try to eliminate bounded universal quantifier in weaker structures. However in 1976, Matiyasevich presented an alternative proof of DPR-theorem [19] by purely existential encoding of computations of Turing machines, which thus gives us another approach for eliminating bounded universal quantifier [20, Section 6.1].

It is easy to modify the final steps of Matiyasevich's proof in order to obtain an existential formula of the language with 0, 1, addition, bitwise minimum &, and concatenation -, where t = x y t = x+ 2<sup>l</sup>(x)y and l(x) is the bit-length of x. Kummer's lemma [18] then plays a crucial role, since it gives an exponential diophantine representation of bitwise minimum (see also an exponential diophantine representation of masking relation in [14]). Note that it is not difficult to define & in the structure -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, V2, <sup>=</sup> by a formula with one bounded universal quantifier, whereas there is an existential formula that defines V<sup>2</sup> in -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &, <sup>=</sup>. This suggests the question whether every <sup>2</sup>-FA-recognizable relation is existentially first-order definable in -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &, <sup>=</sup>.

In Theorem 1, we show that every relation is actually k-FA-recognizable if and only if it is existentially definable in the structure -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup>, where &<sup>k</sup> corresponds to the binary bitwise minimum operation of base k. The same approach is applied in Theorem 2 to obtain an existential first-order characterization of the languages, recognizable by Parikh automata over the alphabet {0, 1, ..., k − 1} <sup>n</sup>. In this case, the structure must be extended by the binary predicate EqNZBk, which is true for those pairs of natural numbers (x, y) such that x and y have the same number of non-zero bits of base k.

Applying essentially the same ideas as in Theorem 1, we are able to show in Theorem <sup>3</sup> that every relation <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> is recognizable by multi-counter machines over the alphabet {0, 1, ..., k − 1} <sup>n</sup> if and only if it is existentially definable in the structure -N; 0, 1, +, &k, <sup>k</sup>, =, where z = x <sup>k</sup> y z = x + k<sup>l</sup>k(x)y and lk(x) is the bit-length of x in base k. Since such machines recognize exactly r.e. sets, this provides yet another [14,19,20] proof of DPR-theorem by purely existential arithmetization of abstract machines.

## 2 Definitions and the main example

This section recalls some basic definitions from logic and automata theory, which will be used in the sequel. Then we illustrate the main idea of the existential characterisations constructed in Sections 3 and 4.

## 2.1 Definability and automata

First-order definability. The domain of all the structures considered in this paper will be the set of natural numbers <sup>N</sup> <sup>=</sup> {0, <sup>1</sup>, <sup>2</sup>, ...}, and we will consider existential definability in some extensions of -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, <sup>=</sup>.

Denote by L<sup>σ</sup> the first-order language of some signature σ. An Lσ-formula ϕ is existential if it has the form ∃xψ(x, y), where ψ(x, y) is a quantifier-free Lσ-formula. Here, x denotes a list of variables x1, ..., xn. We say that an n-ary relation <sup>R</sup> over <sup>N</sup> is first-order (FO-)definable in the structure -<sup>N</sup>; <sup>σ</sup> if there exists an <sup>L</sup>σ-formula <sup>ϕ</sup>(x) such that for every <sup>a</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> we have <sup>R</sup>(a) if and only if ϕ(a). When the formula ϕ(x) is existential, the corresponding relation is called existentially first-order (∃FO-)definable, and similarly for the case of quantifier-free formulas, universal formulas and other quantifier prefixes. We will subsequently write the prefix "FO" in the cases where we also discuss secondorder definability, and in general it will be omitted.

In this paragraph, we focus on definability in the structure -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, V<sup>k</sup>, <sup>=</sup>, where <sup>k</sup> <sup>≥</sup> <sup>2</sup> is an integer, and <sup>V</sup><sup>k</sup> is a binary relation such that <sup>V</sup><sup>k</sup>(x, y) if and only if x is the largest power of k dividing y. Büchi arithmetic of base k is the first-order theory of this structure. The relations definable in this structure are called k*-definable*. Recall that for every multiplicatively independent integer l <sup>≥</sup> <sup>2</sup> (i.e., k<sup>a</sup> <sup>=</sup> <sup>l</sup> <sup>b</sup> for every positive integers a, b), <sup>V</sup><sup>l</sup> is not definable in -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, V<sup>k</sup>, <sup>=</sup> [23,24] (see also a generalization of this result by Bès [1]). In the following, we consider some fixed base <sup>k</sup>. Let &<sup>k</sup> be the binary bitwise minimum operation of base k, where we assume that the natural number of smaller bit-length is supplemented with a sufficient number of leading zeros. For example, we have 120202 &<sup>3</sup> 21201201 = 100201. It is not difficult to prove the following lemma.

Lemma 1. *Every relation is* k*-definable if and only if it is definable in the structure* -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &<sup>k</sup>, <sup>=</sup>*.*

*Proof.* In order to define bitwise minimum, for every j <sup>∈</sup> [0..k <sup>−</sup> 1] we use the relation <sup>X</sup>k,j (x, y), which is defined as *"*<sup>x</sup> *is a power of* <sup>k</sup> *and the coefficient of this power of* k *in the representation of* y *in base* k *equals* j*"*. There is a simple existential formula for this relation in [4,11,24]:

$$X\_{k,j}(x,y) \Longrightarrow V\_k(x,x) \land \exists z \exists t \exists u (y = z + jx + t \land z < x \land (t = 0 \lor (V\_k(u,t) \land x < u))),$$

where x<y - <sup>∃</sup>z(y <sup>=</sup> x <sup>+</sup> z + 1). Therefore, the graph of bitwise minimum can be expressed by a formula with a universal quantifier

$$z = x \&\_ky \rightleftharpoons \forall t \quad \bigwedge\_{(i,j)\in[0..k-1]^2} \left( X\_{k,i}(t,x) \land X\_{k,j}(t,y) \Leftrightarrow X\_{k,\min(i,j)}(t,z) \right) \dots$$

For the converse, by using monus z <sup>=</sup> x <sup>−</sup> y - (z = 0 <sup>∧</sup> x<y) <sup>∨</sup> (x <sup>=</sup> z <sup>+</sup> y), define the set of powers of <sup>k</sup> by the formula <sup>P</sup><sup>k</sup>(x) <sup>⇔</sup> (kx−1)&<sup>k</sup> <sup>x</sup> <sup>=</sup> <sup>x</sup>∧ ¬<sup>x</sup> = 0. Finally, we have <sup>V</sup><sup>k</sup>(x, y) <sup>⇔</sup> P<sup>k</sup>(x) <sup>∧</sup> j∈[1..k−1] (kx <sup>−</sup> 1)&<sup>k</sup> <sup>y</sup> <sup>=</sup> jx. 

We see that <sup>X</sup>k,j (x, y) can be defined in -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &<sup>k</sup>, <sup>=</sup> by the quantifierfree formula P<sup>k</sup>(x) <sup>∧</sup> <sup>y</sup>&<sup>k</sup><sup>x</sup> <sup>=</sup> jx. Let <sup>λ</sup><sup>k</sup>(x) be the greatest power of <sup>k</sup> less or equal to <sup>x</sup> when x > <sup>0</sup>, and <sup>λ</sup><sup>k</sup>(0) = 1. Formally, we have the definition y <sup>=</sup> λ<sup>k</sup>(x) <sup>⇔</sup> (<sup>x</sup> = 0∧<sup>y</sup> = 1)∨(P<sup>k</sup>(y)∧<sup>y</sup> <sup>≤</sup> <sup>x</sup>∧x<y). Now an analogue of bitwise negation can be defined as follows: <sup>∼</sup><sup>k</sup> (y, x)=(kλ<sup>k</sup>(y) <sup>−</sup> 1) <sup>−</sup> <sup>x</sup>&k(kλ<sup>k</sup>(y) <sup>−</sup> 1). Here, <sup>∼</sup><sup>k</sup> (y, x) has the same bit-length as <sup>y</sup>, and we assume that &<sup>k</sup> has a higher precedence than + or monus. For our purposes, it is useful to include in the signature a binary function symbol for bitwise maximum

$$\begin{aligned} z = x|\_k y \Leftrightarrow (x < y \land z = \sim\_k (y, \sim\_k (y, x) \&\_k \sim\_k (y, y))) \lor \\ (y \le x \land z = \sim\_k (x, \sim\_k (x, x) \&\_k \sim\_k (x, y)). \end{aligned}$$

We will write <sup>x</sup> <sup>k</sup><sup>n</sup> with some fixed natural number n for the function whose graph is quantifier-free definable by the formula y = <sup>x</sup> <sup>k</sup><sup>n</sup> <sup>⇔</sup> <sup>k</sup><sup>n</sup><sup>y</sup> <sup>≤</sup> <sup>x</sup> <sup>∧</sup> x<k<sup>n</sup>(<sup>y</sup> + 1). The function 1k(y) gives a natural number of the same bit-length with y, but with all k-ary digits equal to one: x = 1k(y) ⇔ (k − 1)x = kλk(y) − 1. For notational convenience, let us introduce a binary predicate symbol <sup>k</sup> such that x <sup>k</sup> y x&ky = x. The following lemma summarizes these definability results and will be implicitly used in the next sections.

Lemma 2. The predicates <sup>P</sup>k, <sup>V</sup>k, <sup>X</sup>k,j , <sup>&</sup>lt;, <sup>≤</sup> and the graphs of functions <sup>−</sup>, <sup>λ</sup>k, <sup>∼</sup>k, <sup>1</sup>k, <sup>|</sup>k, and · <sup>k</sup><sup>n</sup> for every fixed <sup>n</sup> <sup>≥</sup> <sup>1</sup> are <sup>∃</sup>-definable in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup> .

The existential encoding of k-automata in Subsection 2.2 uses a ∃-definable function, which echoes a construction that was applied by Matiyasevich [19] in his arithmetization of Turing machines. For every a ∈ [1..k − 1] the function Θk,a(x) substitutes 1 for every digit of x equal to a, and 0 otherwise. Then, the graph of this function is defined as follows:

$$y = \Theta\_{k,a}(x) \Leftrightarrow \exists x\_1...\exists x\_{k-1} \left(\bigwedge\_{1 \le i < j \le k-1} x\_i \&\_x x\_j = 0 \land \\ \begin{aligned} \bigwedge\_{1 \le i < j \le k-1} x\_i \&\_x x\_j &= 0 \land \\ (x\_1 +... + x\_{k-1}) \not\sim\_k \mathbf{1}\_k(x) \land \\ x\_1 + 2x\_2 +... + (k-1)x\_{k-1} &= x \land y = x\_a \right). \end{aligned} \tag{1}$$

Note that each digit in the k-ary representation of every quantified variable in (1) is either 0 or 1. Moreover, if we denote 1¯k(x) x&k1k(x) then the sum <sup>x</sup><sup>1</sup> <sup>+</sup> ... <sup>+</sup> <sup>x</sup><sup>k</sup>−<sup>1</sup> is exactly <sup>1</sup>¯k(x). In the case of digit zero, the function <sup>Θ</sup>k,<sup>0</sup> has an extra parameter that specifies the number of leading zeros, which must be replaced by ones:

$$y = \Theta\_{k,0}(t, x) \Leftrightarrow y = \mathbf{1}\_k(t) - \bar{\mathbf{1}}\_k(x). \tag{2}$$

In particular, when λk(t) < λk(x), we always have Θk,<sup>0</sup>(t, x)=0 and otherwise we obtain, for example, Θ<sup>3</sup>,<sup>0</sup>(100000, 1020) = 110101.

Remark 1. In Subsection 2.2 and Section 3 it is convenient to write Θk,a(t, x) instead of Θk,a(x) when a ∈ {1, ..., k − 1}. In Section 4 there is no need to consider auxiliary zeros, and we use Θk,a with a single parameter assuming that Θk,<sup>0</sup>(x) Θk,<sup>0</sup>(x, x).

We conclude this paragraph by defining a set of natural numbers 1¯k(N) = {1¯k(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>N</sup>}. This definition will be useful in the next paragraph.

Second-order definability. Similarly to Bès [2], let us denote by F the set of finite subsets of <sup>N</sup> and also define a function cod<sup>k</sup> : <sup>F</sup><sup>n</sup> <sup>→</sup> <sup>N</sup><sup>n</sup> which maps every tuple (X1, ..., Xn) ∈ F<sup>n</sup> to the tuple of non-negative integers codk(X) = ( i∈X<sup>1</sup> ki , ..., i∈X<sup>n</sup> ki ). We see that the image of cod<sup>k</sup> is 1¯k(N). This function estab-

lishes a connection between first-order definability and weak monadic secondorder (WMSO-)definability in N; <sup>S</sup> in the following way.

Recall that WMSO-language LWMSO σ allows to quantify over finite subsets of the domain, and its signature σ has auxiliary binary predicate symbol <sup>∈</sup> for the membership relation x <sup>∈</sup> X. Again, let the domain of our structures be the set of natural numbers <sup>N</sup>. Then a relation R ⊆ F<sup>n</sup> is *WMSO-definable in the structure* N; σ if there exists a LWMSO σ -formula <sup>ϕ</sup>(X<sup>1</sup>, ..., Xn) such that <sup>R</sup>(A) <sup>⇔</sup> <sup>ϕ</sup>(A) for every <sup>A</sup> ∈ Fn. As was explicitly shown by Villemaire [23, Theorem 3.3], every relation R ⊆ F<sup>n</sup> is WMSO-definable in the structure N; <sup>S</sup> if and only if cod<sup>2</sup>(R) is FO-definable in N; 0, <sup>1</sup>, <sup>+</sup>, V<sup>2</sup>, <sup>=</sup>.

Note that codk is bijective only in the case <sup>k</sup> = 2 when we have <sup>1</sup>¯2(N) = <sup>N</sup>. In the case when k > <sup>2</sup>, we can transfer FO-definability results for extensions of k-Büchi arithmetic to their WMSO-definability analogues using the function codk : <sup>N</sup> → Fk−<sup>1</sup> which maps every <sup>x</sup> <sup>∈</sup> <sup>N</sup> to the tuple codk(x) = (cod<sup>−</sup><sup>1</sup> k (Θk,<sup>1</sup>(x)), ..., cod<sup>−</sup><sup>1</sup> k (Θk,k−<sup>1</sup>(x))). This function can obviously be extended such that codk : <sup>N</sup><sup>n</sup> <sup>→</sup> - <sup>F</sup>k−<sup>1</sup><sup>n</sup> . We use codk to establish a relationship between <sup>∃</sup>FO-definability in N; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup> and <sup>∃</sup>WMSO-definability in N; <sup>S</sup> extended with cardinality constraints of the form <sup>|</sup>X<sup>1</sup><sup>|</sup> <sup>+</sup> ... <sup>+</sup> <sup>|</sup>Xr<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>Y<sup>1</sup>|+...+|Ys|. Section <sup>3</sup> focuses on the existential definability in these structures and recognizability by Parikh automata [16]. We say that R ⊆ F<sup>n</sup> is *existentially (*∃*)WMSO-definable in the structure* N; σ if there exists an LWMSO σ -formula <sup>∃</sup>Y ϕ(X, Y ), where ϕ(X, Y ) may include arbitrary first-order quantifiers, such that for every A ∈ F<sup>n</sup> we have R(A) if and only if <sup>∃</sup>Y ϕ(A, Y ).

The following lemma shows that it is sufficient to extend N; S with the relation EqCard(X, Y ) - <sup>|</sup>X<sup>|</sup> <sup>=</sup> <sup>|</sup>Y <sup>|</sup> to reason about <sup>∃</sup>WMSO-definability in <sup>N</sup> with successor S and cardinality constraints.

Lemma 3. *Every cardinality constraint* <sup>|</sup>X<sup>1</sup><sup>|</sup> <sup>+</sup> ... <sup>+</sup> <sup>|</sup>Xr<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>Y<sup>1</sup><sup>|</sup> <sup>+</sup> ... <sup>+</sup> <sup>|</sup>Ys<sup>|</sup> *is existentially WMSO-definable in the structure* N; S,EqCard*.*

*Proof.* Let us first define the graph of <sup>∩</sup> using a formula with one universal first-order quantifier <sup>∀</sup>x(x <sup>∈</sup> Z <sup>⇔</sup> x <sup>∈</sup> X <sup>∧</sup> x <sup>∈</sup> Y ) (and analogously, the graphs of union Z <sup>=</sup> X <sup>∪</sup> Y and difference Z <sup>=</sup> X \ Y ) and the empty set X <sup>=</sup> ∅ ⇔ <sup>∀</sup>x(¬x <sup>∈</sup> X).

Now it is not difficult to see that

$$|X\_1| + \ldots + |X\_r| < |Y\_1| + \ldots + |Y\_s| \Leftrightarrow \exists U \exists V \exists X\_1' \dots \exists X\_r' \exists Y\_1' \dots \exists Y\_s'$$

$$\bigwedge\_{1 \le i < j \le r} X\_i' \cap X\_j' = \emptyset \land \bigwedge\_{1 \le i \le r} \![eqCard(X\_i, X\_i') \wedge$$

$$\bigwedge\_{1 \le i < j \le s} Y\_i' \cap Y\_j' = \emptyset \land \bigwedge\_{1 \le i \le s} \![eqCard(Y\_i, Y\_i') \wedge$$

$$\bigcup\_{1 \le i \le r} X\_i' = U \land \bigcup\_{1 \le i \le s} Y\_i' = V \land U \cap V = U \land \neg(V \mid U = \emptyset) \,\!\scriptstyle{x}$$

The following fact is an analogue of Villemaire's theorem [23]. Note that when <sup>k</sup> = 2 the function cod<sup>2</sup> is exactly cod<sup>−</sup><sup>1</sup> <sup>2</sup> .


The proof of this proposition is rather straightforward and follows along similar lines as the proof of Villemaire's theorem. Only notice that in order to deal with universal FO-quantifiers in *(i)*, we apply Corollary 1 from Subsection 2.2.

Klaedtke and Rueß show in [16] that every relation <sup>R</sup> ⊆ F<sup>n</sup> is existentially WMSO-definable in the structure N; S,EqCard if and only if it is recognizable by some Parikh automaton over the alphabet {0, 1}. By reduction to the emptiness problem for Parikh automata, they show that satisfiability of existential WMSO-formulas in the structure N; S,EqCard is decidable. The next paragraph gives the necessary definitions.

Automata languages. Büchi-Bruyère's theorem [4,5] states that every relation is first-order definable in the structure N; 0, <sup>1</sup>, <sup>+</sup>, Vk, <sup>=</sup> if and only if it is recognizable by a finite k-automaton. Haase and Różycki [11] prove that this statement is however not true if we consider *existential* first-order definability in N; 0, <sup>1</sup>, <sup>+</sup>, Vk, <sup>=</sup>. We first recall some automata-theoretic definitions and then show that substituting &<sup>k</sup> for V<sup>k</sup> yields the desired existential description of k-recognizable sets.

Let Σ be some alphabet and Σ<sup>∗</sup> denote the set of words of finite length over Σ with a unique empty word of length 0. Then a *(non-deterministic) finite* Σ*automaton (*Σ*-FA)* is a <sup>4</sup>-tuple <sup>A</sup> = (Q, q0, F, δ), where <sup>Q</sup> <sup>=</sup> {q0, ..., qs} is a finite set of states with initial state <sup>q</sup><sup>0</sup> and the set <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> of finial states; <sup>δ</sup> : <sup>Q</sup>×<sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> is the transition function, where <sup>2</sup><sup>Q</sup> is the power set of <sup>Q</sup>. A configuration of <sup>A</sup> is a pair (q, x), where q ∈ Q is a current state and x ∈ Σ<sup>∗</sup> is an unused part of an input word. A transition relation → over configurations of A is defined such that (q, ax) → (q , x) if and only if q ∈ δ(q, a). A sequence of transitions between configurations is called a *computation of* <sup>A</sup>. We say that <sup>x</sup> <sup>=</sup> <sup>x</sup>0x<sup>1</sup> ··· <sup>x</sup><sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>t</sup>+1 is accepted by a given Σ-FA A if there is an accepting computation of A for x, that is, a sequence (q0, x0x1...xt) → (q , x1...xt) → ··· → (q, xt) → (q<sup>f</sup> , ) for some q<sup>f</sup> ∈ F. The set of all words x ∈ Σ<sup>∗</sup> accepted by Σ-FA A defines the language recognizable by this automaton. This language is denoted by L(A).

A *finite* k*-automaton (*k*-FA)* is defined as a Σ<sup>n</sup> <sup>k</sup> -FA, where every letter from Σ<sup>n</sup> <sup>k</sup> is an n-tuple of digits from Σ<sup>k</sup> = {0, 1, ..., k − 1}. To each language <sup>L</sup> <sup>⊆</sup> (Σ<sup>n</sup> <sup>k</sup> )<sup>∗</sup> there corresponds a relation <sup>R</sup><sup>L</sup> over <sup>N</sup><sup>n</sup> in the following way: R<sup>L</sup> = { t <sup>i</sup>=0 <sup>x</sup>ik<sup>i</sup> <sup>|</sup> <sup>x</sup><sup>0</sup> ··· <sup>x</sup><sup>t</sup> <sup>∈</sup> <sup>L</sup>}. An <sup>n</sup>-ary relation <sup>R</sup> over <sup>N</sup> is called <sup>k</sup>*-FArecognizable* if there exists a <sup>k</sup>-FA <sup>A</sup> such that for every <sup>a</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> we have R(a) ⇔ RL(A)(a). For technical convenience, the notion of k-recognizability is commonly defined [4,23,24] for deterministic k-FA (k-DFA), where for every state <sup>q</sup> and letter <sup>a</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup> <sup>k</sup> it holds that |δ(q, a)| ≤ 1. Since Σ-FA and Σ-DFA recognize the same class of languages [17], i.e. the class of regular languages over the alphabet Σ, this restriction does not change the class of recognizable relations. In our logical characterization of k-FA-recognizable relations we will not benefit from such restrictions on the transition function.

The definition of Σ-FA can be extended by adjoining to every letter of Σ a vector <sup>v</sup> <sup>∈</sup> <sup>D</sup>, where <sup>D</sup> is a finite subset of <sup>N</sup><sup>m</sup>, and imposing certain restrictions on the accepting sequences of transitions to obtain Parikh finite automata (Σ-PFA). That is, for some m > <sup>0</sup> and a finite set <sup>D</sup> <sup>⊆</sup> <sup>N</sup><sup>m</sup>, a <sup>Σ</sup>-PFA is a pair (A, ϕ), denoted by <sup>A</sup><sup>ϕ</sup>, where <sup>A</sup> is a (<sup>Σ</sup> <sup>×</sup> <sup>D</sup>)-FA and <sup>ϕ</sup>(x1, ..., xm) is an existential L-<sup>0</sup>,1,+,=-formula. It is convenient to think of a configuration of <sup>Σ</sup>-PFA as an (m+2)-tuple (q, x, y1, ..., ym) where the pair (q, x) is the same as in the definition of configurations of Σ-FA, and (y1, ..., ym) is a vector from N<sup>m</sup>. A transition relation between two configurations of <sup>Σ</sup>-PFA <sup>A</sup><sup>ϕ</sup> is now defined as follows: (q, ax, y1, ..., ym) <sup>→</sup> (q , x, y1+d1, ..., ym+dm) if and only if <sup>q</sup> <sup>∈</sup> <sup>δ</sup>(q, a, d1, ..., dm). A word <sup>x</sup> <sup>=</sup> <sup>x</sup>0x<sup>1</sup> ··· <sup>x</sup><sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>t</sup>+1 is accepted by <sup>A</sup><sup>ϕ</sup> if there is a computation (q0, x0x<sup>1</sup> ··· <sup>x</sup>t, <sup>0</sup>, ..., 0) <sup>→</sup> (q , x<sup>1</sup> ··· <sup>x</sup>t, y 1, ..., y <sup>m</sup>) → ··· → (q, xt, y <sup>1</sup> , ..., y <sup>m</sup>) → (q<sup>f</sup> , , y1, ..., ym) for some <sup>q</sup><sup>f</sup> <sup>∈</sup> <sup>F</sup> and the formula <sup>ϕ</sup>(y1, ..., ym) is true. We denote by <sup>L</sup>(A<sup>ϕ</sup>) the language recognizable by <sup>Σ</sup>-PFA <sup>A</sup><sup>ϕ</sup>.

In order to deal with definability over the natural numbers, we again consider Σ<sup>n</sup> <sup>k</sup> -PFA, which we call a k-Parikh finite automata (k-PFA). The k-PFArecognizable relations <sup>R</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> are defined analogously. The prefixes <sup>Σ</sup>- and k- will be sometimes omitted when the exact alphabet Σ or value of k is not significant.

The original definition of Parikh automata [16] uses semi-linear sets <sup>C</sup> <sup>⊆</sup> <sup>N</sup><sup>t</sup> instead of existential formulas of Presburger arithmetic, but it is well-known [10] that these definitions of PFA are equivalent. The main result by Klaedtke and Rueß [15, Theorems 12 and 15] states that every relation <sup>R</sup> ⊆ F<sup>n</sup> is <sup>∃</sup>WMSOdefinable in the structure N; S,EqCard if and only if the relation cod−<sup>1</sup> <sup>2</sup> (R) is 2-PFA-recognizable. The "only if" part of this WMSO-characterization follows from the fact that the class of languages recognizable by PFA is closed under union, intersection, left and right quotients [15, Property 4] and that EqCard with its negation are recognizable by 2-PFA. Since it is easy to construct k-PFA for the predicate EqNZB<sup>k</sup> and for its negation, the following proposition can be proved in a similar way.

Proposition 2. If some relation <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> is existentially FO-definable in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup> then it is <sup>k</sup>-PFA-recognizable.

Based on Parikh's theorem [21], Klaedtke and Rueß proved decidability of the emptiness problem for PFA, and thus decidability of the existential WMSOtheory of N; S,EqCard. They also proved that the universality problem for Parikh automata is undecidable. In contrast to finite automata, deterministic Parikh automata, where for every (q, a) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup><sup>n</sup> <sup>k</sup> there exists at most one pair (q , <sup>d</sup>) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>D</sup> such that <sup>q</sup> <sup>∈</sup> <sup>δ</sup>(q,(a, <sup>d</sup>)), are less powerful than PFA. The paper by Cadilhac, Finkel and McKenzie [6] provides some explicit examples of languages recognizable by PFA but not by any deterministic PFA. These authors continued the study of other properties of PFA and, in particular, proved undecidability of the regularity property for PFA. This result will be used in Section 3.

## 2.2 Existential characterization of <sup>k</sup>-FA-recognizable languages

In this section we illustrate the main idea of the existential characterisation from Section 3. Our aim now is to prove the following theorem.

Theorem 1. For an integer <sup>k</sup> <sup>≥</sup> <sup>2</sup> every relation is <sup>k</sup>-FA-recognizable if and only if it is existentially definable in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup>.

Proof. Let A = (Q, q0, F, δ) be a k-FA. We are going to prove existential definability of the relation <sup>R</sup>L(A) in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup> by encoding the existence of an accepting computation of A when the input word is the k-ary representation of x = x1, ..., xn. To this end, let us first introduce new variables q = q0, ..., q<sup>s</sup> for every state q<sup>i</sup> ∈ Q; for a state p ∈ Q, we denote by ν(p) its number from [0..s]. The following restriction on q expresses the fact that at each step of a computation the automaton A has a unique state from Q:

$$K\_k(t, \overline{q}) \vplus \bigwedge\_{0 \le i < j \le s} q\_i \&\_{k} q\_j = 0 \land q\_0 + ... + q\_s = \mathbf{1}\_k(t) \land 1 \preccurlyeq\_k q\_0 \land \bigvee\_{p \in F} t \preccurlyeq\_k q\_{\nu(p)}.\tag{4}$$

Here t will be another existentially quantified variable that will be a power of k. This variable corresponds to a configuration (p, ) for some p ∈ F, and formula (4) also requires that the computation starts in the state q0. It is obvious that t must be greater than x<sup>i</sup> for every i ∈ [1..n]; this restriction will appear in the resulting formula below.

In order to express the fact that each step of a computation is performed in accordance with the transition function <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup><sup>n</sup> <sup>k</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup>, we introduce a predicate <sup>Δ</sup>(p,a). For every pair (p, <sup>a</sup>) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup><sup>n</sup> <sup>k</sup> , we have

$$\text{the } \Delta\_{(p,\overline{\pi})}. \text{ For every pair } (p,a) \in Q \times \Sigma\_k^n, \text{ we have}$$

$$\Delta\_{(p,\overline{\pi})}(t,\overline{q},\overline{x}) \Longrightarrow \left(q\_{\nu(p)} \&\_{k} \; \mathop{\&}\nolimits\_{k} \; \mathop{\otimes}\_{k,a\_i} (t,x\_i)\right) \not\simeq\_k \left(\underset{\widetilde{p}\in \delta(p,\overline{\pi})}{\; \mathop{k}} \; \mathop{\frac{q\_{\nu(\widetilde{p})}}{\; \widetilde{k}}}{\text{ $k$ }}\right),\tag{5}$$

where, by definition, | k y∈∅ y = 0. From this formula we see that at each step of an

accepting computation there are either no configurations with the state p and a word starting with the letter a = (a1, ..., an), or in the next configuration the state will be from δ(p, a). By combining formulas (4) and (5), we conclude that

$$R\_{L(\mathcal{A})}(\overline{x}) \Leftrightarrow \exists t \exists \overline{q} \left( P\_k(t) \land \bigwedge\_{i \in [1..n]} x\_i < t \land K\_k(t, \overline{q}) \land \bigwedge\_{(p, \overline{a}) \in Q \times \Sigma\_k^n} \Delta\_{(p, \overline{a})}(t, \overline{q}, \overline{x}) \right). \tag{6}$$

It remains to use formulas (1) and (2), Büchi-Bruyère's theorem and Lemmas 1 and 2. 

Corollary 1. If a relation is definable in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup> then it is existentially definable in this structure.

This result for k = 2 can be transferred to the second-order case similarly to Proposition 1. Thus, we obtain a corollary, which was essentially proved by Elgot [9, Theorem 5.3 (b)].

Corollary 2. If a relation <sup>R</sup> ∈ F<sup>n</sup> is WMSO-definable in the structure N; <sup>S</sup> then it is existentially WMSO-definable in this structure.

## 3 First-order characterization of Parikh automata

The aim of this section is to prove the converse statement to Proposition 2 and thus obtain an existential first-order characterization of Parikh automata languages. Parikh map over the natural numbers can be defined as a function <sup>Φ</sup><sup>k</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup><sup>k</sup> such that <sup>Φ</sup>k(x) = (#k,<sup>0</sup>(x), ..., #k,k−<sup>1</sup>(x)), where every function #k,i counts the number of occurrences of the digit i in k-ary representation of x. For such counting functions we have the following lemma.

Lemma 4. *Let* R(x1, ..., xn) *be a relation that is existentially definable in the structure* N; 0, <sup>1</sup>, <sup>+</sup>, <sup>=</sup>*, and let* <sup>a</sup> *be some vector from* {0, ..., k−1}<sup>n</sup>*. Then the relation* <sup>R</sup>(#k,a<sup>1</sup> (x1), ..., #k,a<sup>n</sup> (xn)) *is* <sup>∃</sup>*-definable in* N; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup>*.*

*Proof.* It is sufficient to define the relations #k,a(x) = d for integers d ≥ 0 and #k,a(x)+#k,b(y)=#k,c(z) by some existential formulas. For the first relation we have the formula EqNZBk(Θk,a(x), k<sup>d</sup> <sup>−</sup> 1), and for the second one there is the following first-order analogue to formula (3):

$$\begin{aligned} \#\_{k,a}(x) + \#\_{k,b}(y) = \#\_{k,c}(z) &\Leftrightarrow \exists x' \exists y' (EqNZB\_k(x'+y', \Theta\_{k,c}(z)) \land \\ x' \&\_ky' = 0 \land EqNZB\_k(\Theta\_{k,a}(x), x') \land EqNZB\_k(\Theta\_{k,b}(y), y')). \end{aligned}$$

It remains to use existential definability of the graph of Θk,i in the structure N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup>.

Note that every function #k,i can be represented in terms of Subsection 2.1 as #k,i(x) = <sup>|</sup>cod−<sup>1</sup> <sup>k</sup> (Θk,i(x))|, and thus this lemma can also be proved using Lemma 3 and the first part of Proposition 1. 

Let D be some finite subset of N<sup>m</sup>, and let M(D) be the maximum integer occurring in D. The same as Klaedtke and Rueß [16], we encode vectors from D of a given k-Parikh automaton by introducing M(D)+1 new variables yi,<sup>0</sup>,...,yi,M(D) for each coordinate yi. For every i ∈ [1..m], these variables will be pairwise *disjoint* (i.e. yi,j1&kyi,j<sup>2</sup> = 0 for j<sup>1</sup> = j2) and their representation in base k will contain only zeros and ones. For this reason, we use only #k,<sup>1</sup> in our encoding and denote #<sup>k</sup> -#k,<sup>1</sup>.

Theorem 2. *For every integer* <sup>k</sup> <sup>≥</sup> <sup>2</sup> *a relation* <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> *is* <sup>k</sup>*-PFA-recognizable if and only if it is* <sup>∃</sup>*-definable in the structure* N; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup>*.*

*Proof.* The "if" direction of this theorem is Proposition 2. In the proof of the "only if" direction, suppose we are given a k-Parikh automaton A<sup>ϕ</sup> for some finite set <sup>D</sup> <sup>∈</sup> <sup>N</sup><sup>m</sup>, where <sup>A</sup> = (Q, q0, F, δ) is a FA over the language <sup>Σ</sup><sup>n</sup> <sup>k</sup> ×D and ϕ is an existential L0,1,+,=-formula. We are going to construct an existential L0,1,+,&k,EqNZBk,=-formula ψ such that RL(Aϕ)(a) if and only if ψ(a) for every <sup>a</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup>. Again, <sup>ψ</sup>(x) will encode the existence of an accepting computation of A<sup>ϕ</sup> when the input word is the k-ary representation of x.

The sequence of states from an accepting computation of A can be encoded using the predicate Kk(t, q), defined by the existential L0,1,+,&k,=-formula (4).

We modify formula (5) so that it works with the alphabet Σ<sup>n</sup> <sup>k</sup> ×D. To this end, let us introduce m(M(D) + 1) variables y = y<sup>1</sup>,<sup>0</sup>,...,y<sup>1</sup>,M(D),...,ym,<sup>0</sup>,...,ym,M(D) such that for every i ∈ [1..m] it holds that θk(t, yi,<sup>0</sup>, ..., yi,M(D)), where

$$\theta\_k(t, y\_0, \ldots, y\_M) \underset{0 \le i < j \le M}{\text{\textquotedblleft}} \quad y\_i \&\_k y\_j = 0 \land y\_0 + \ldots + y\_M = \mathbf{1}\_k(t).$$

Now for every (p, a, <sup>d</sup>) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup><sup>n</sup> <sup>k</sup> × D we have:

$$\begin{split} \Delta\_{\left(p,\overline{\mathfrak{a}},\overline{\mathfrak{d}}\right)}(t,\overline{\mathfrak{q}},\overline{\mathfrak{x}},\overline{\mathfrak{y}}) \Longrightarrow \left(q\_{\nu\left(p\right)}\&\_{k}\; \mathop{\mathbb{\mathfrak{Q}}}\_{i\in\left[1..n\right]}\Theta\_{k,a\_{i}}(t,x\_{i})\&\_{k}\; \mathop{\mathbb{\mathfrak{Q}}}\_{k}\; \mathop{\mathbb{\mathfrak{Q}}}\_{\begin{subarray}{c}\mathfrak{z}\_{k}\\j\in\left[1..m\right] \end{subarray}}y\_{j,d\_{j}}\right) \preccurlyeq\_{k} \\ \displaystyle\qquad \left(\bigsqcup\_{\begin{subarray}{c}k\\\widetilde{p}\in\delta\left(p,\overline{\mathfrak{d}},\overline{\mathfrak{d}}\right) \end{subarray}}\frac{q\_{\nu\left(\widetilde{p}\right)}}{k}\right). \end{split}$$

Recall that the expression with bitwise maximums | <sup>k</sup> evaluates to zero when δ(p, a, d) = ∅.

By combining all the parts of the existential definition of RL(Aϕ), we get the following analogue to formula (6):

$$\begin{split} R\_{L(\mathcal{A}\_{\varphi})}(\overline{x}) &\Leftrightarrow \exists t \exists \overline{q} \exists \overline{y} \Big( P\_{k}(t) \land \bigwedge\_{i \in [1..n]} x\_{i} < t \land K\_{k}(t, \overline{q}) \land \\ &\bigwedge\_{i \in [1..m]} \theta\_{k}(t, y\_{i,0}, ..., y\_{i,M(D)}) \land \bigwedge\_{\langle p, \overline{a}, \overline{d} \rangle \in Q \times \Sigma\_{k}^{n} \times D} \Delta\_{\langle p, \overline{a}, \overline{d} \rangle}(t, \overline{q}, \overline{x}, \overline{y}) \land \\ &\varphi \Big( \sum\_{c \in [1..M(D)]} c \#\_{k}(y\_{1,c}), ..., \sum\_{c \in [1..M(D)]} c \#\_{k}(y\_{m,c}) \Big) \Big) .\end{split}$$

It remains to apply Lemma 4 to obtain the desired existential formula.

This result gives us the following statement concerning decidability of fragments of the first-order theory of the structure <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup> .

Corollary 3. *The existential theory of* <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup> *is decidable and the* ∀∃*-theory of this structure is undecidable.*

*Proof.* The first part of the corollary is just a variation on the automata-theoretic techniques that were formalized by Hodgson [12]. It follows from the decidability of the emptiness problem for PFA. Undecidability of the universality problem, combined with Theorem 2, imply undecidability already for the problem of deciding ∀∃-formulas with a single universal quantifier.

Haase and Różycki [11, Conclusion] ask whether the property of ∃-definability is decidable for the relations definable in the structure <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, Vk, <sup>=</sup> . Using Theorem 1, this problem can be reformulated so that we consider only existentially definable sets, but now the signatures are different. Namely, the question is whether we can decide if a set <sup>∃</sup>-definable in the structure <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, Vk, &k, <sup>=</sup> is <sup>∃</sup>-definable in <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, Vk, <sup>=</sup> . A similar question can be answered in the negative for the structure with &<sup>k</sup> and EqNZBk.

Proposition 3. *The problem of deciding whether a set existentially definable in the structure* -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k,EqNZBk, <sup>=</sup> *is* <sup>∃</sup>*-definable in* -<sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>=</sup> *is undecidable.*

This follows from Theorems 1 and 2, and from undecidability of the regularity property for Parikh automata, which was proved by Cadilhac, Finkel and McKenzie [6, Proposition 7].

Parikh automata are closely related to multi-counter machines (MCM): they recognize exactly the same languages as reversal-bounded MCM [15, Section A.3] (see also [6, Subsection 3.3]). Recall that a MCM is *reversal-bounded* (the notion was introduced by Ibarra [13]) if there exists a pair of integers (r, s) such that in every accepting computation the value of each counter increases and decreases at most r times and the input head reverses at most s times. Theorem 2 now gives an existential first-order characterization of this restricted version of MCM. It is clear that the model of PFA is more suitable for our logical descriptions. However, as we will see in the next section, the behaviour of MCM can be described in a similar way when the structure is extended with concatenation.

## 4 Multi-counter machines and DPR-theorem

#### 4.1 Two-way multi-counter machines

Same as Ibarra [13], we define a *two-way multi-counter machine* <sup>M</sup> over an alphabet <sup>Σ</sup> (Σ*-MCM* ) with two special symbols , as a tuple (m, Q, q0, F, δ). Here, m ≥ 0 is the number of the counters of M, the triple (Q, q0, F) has its standard meaning, and <sup>δ</sup> is a function from <sup>Q</sup> <sup>×</sup> (<sup>Σ</sup> ∪ {, }) × {0, <sup>1</sup>}<sup>m</sup> to <sup>2</sup><sup>Q</sup>×{−1,0,1}m+1 . Every computation of <sup>M</sup> starts with an input <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> written on the tape between the delimiters: x , and the input head of M reading the left delimiter . A configuration of M on an input x is given by an (m + 3)-tuple (q, x , i, y1, ..., ym) denoting the fact that M is in state q, the read-only input head scans the i-th symbol of the input, and y1,...,y<sup>m</sup> are some non-negative integer values of the counters. The relation → over configurations is defined such that (q, x , i, y1, ..., ym) → (q , x , i+Δ, y1+d1, ..., y<sup>m</sup> +dm) if and only if (q , Δ, d1, ..., dm) ∈ δ(q, a, [y<sup>1</sup> > 0], ..., [y<sup>m</sup> > 0]), where a is the i-th symbol of the input and [y > 0] returns 1 if y > 0, and 0 otherwise. A natural restriction on δ prevents the cases when: (1) [y<sup>j</sup> > 0] = 0 and d<sup>j</sup> = −1; (2) i = 0 and Δ = −1; (3) the i-th symbol of the input is and Δ = 1.

We say that x ∈ Σ<sup>∗</sup> is accepted by a given Σ-MCM if for the input word x there is a computation (q0, x , 0, 0, ..., 0) → ... → (q<sup>f</sup> , x , 0, 0, ..., 0) for some q<sup>f</sup> ∈ F. The set of all the words x ∈ Σ<sup>∗</sup> accepted by a Σ-MCM M defines the language recognized by this machine, which we denote by L(M). In order to properly relate Σ-MCM with definability over N, we again assume that Σ = Σ<sup>n</sup> <sup>k</sup> for <sup>k</sup> <sup>≥</sup> <sup>2</sup>. Every <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is now an element of <sup>N</sup><sup>n</sup> in the inverse base <sup>k</sup> representation. An n-ary relation R over N is called k*-MCM-recognizable* if there exists a Σ<sup>n</sup> <sup>k</sup> -MCM <sup>M</sup> such that for every <sup>a</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> we have <sup>R</sup>(a) <sup>⇔</sup> <sup>R</sup>L(M)(a).

Two-way multi-counter machines can simulate Turing machines (see e.g. [17]), and thus every relation R over N<sup>n</sup> is r.e. iff it is k-MCM-recognizable. The aim of this section is to use the same arguments as in the cases of k-FA and k-PFA in order to obtain an existential characterization of r.e. relations, and Theorem 3 gives us the desired result. The proof will be in some sense intermediate between the arithmetization of Turing machines by Matiyasevich [19] and the encoding of register machines by Jones and Matiyasevich in [14], but here we emphasize the role of concatenation in existential characterizations of multi-counter languages.

### 4.2 The role of concatenation in DPR-theorem

Matiyasevich's proof [19] implicitly gives us a description of every r.e. set via ∃-formulas of the first-order language with 0, 1, addition, bitwise multiplication &2, concatenation -<sup>2</sup>, and equality. Here, t = x <sup>k</sup> y t = x + k<sup>l</sup>k(x)y = x+kλk(x)y, where lk(x) is the length of x in k-ary notation. This section aims to prove this theorem using the ideas from Subsection 2.2. Informally speaking, the main difference between the case of k-MCM and k-FA is that we now consider bytewise multiplication instead of bitwise from Theorem 1. Suppose a given <sup>k</sup>-MCM accepts <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup> <sup>k</sup> and let M be the maximum value of all the counters of some accepting computation for x. If u is a power of k which is greater than the maximum of k<sup>M</sup> and all the xi, then lk(u) will be the size of the byte in our encoding. Every non-negative integer can be represented as a sequence of bytes of size lk(u), which will be called u-bytes.

First, we introduce some auxiliary devices, which are required in our construction. Define the predicate Δk(u, t, x), which is true when u is a power of k greater than k<sup>2</sup>, the variable x has the same u-byte-length as t and has the following form

$$x = \underbrace{1000...0}\_{l\_k(u)} \* \dots \* \underbrace{0\dots010\dots0}\_{l\_k(u)} \dots \underbrace{000\dots001}\_{l\_k(u)}$$

where ∗ ∗ is either 10 or 01, and for every two consecutive u-bytes b1, b<sup>2</sup> in x the only 1 in b<sup>2</sup> is either in the same place or one bit left/right of its position in b1. Moreover, the two most significant bits in every u-byte are equal to zero. We will use this predicate to describe a position of the input head and values of the counters in configurations of a given k-MCM. Before we proceed with the existential definition of this relation, we need to introduce some auxiliary functions. The first one performs the right shift by lk(z) bits and can be defined via the formula y = <sup>x</sup> <sup>z</sup> ⇔ ∃v∃u(λk(z) = <sup>u</sup>∧λk(v) <sup>≤</sup> <sup>u</sup>∧<sup>x</sup> <sup>=</sup> <sup>u</sup> <sup>k</sup> y −u+v). The second function is Copyk(u, t, x) which maps to zero when λk(u) < λk(x), and otherwise gives us the sequence of u-bytes of the same u-byte-length as t such that each u-byte is equal to x. The following lemma gives the desired definition, and then we immediately prove existential definability of Δk(u, t, x).

Lemma 5. The function Copy<sup>k</sup> is <sup>∃</sup>-definable in N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>k</sup>, <sup>=</sup> .

Proof. We start with the predicate Cpyk(x, y) which is true whenever y has the form x <sup>k</sup> ... <sup>k</sup> x. Its definition is rather standard:

$$C p y\_k(x, y) \Leftrightarrow y = x \lor \exists z (y = x \frown\_k z \land y = z \frown\_k x).$$

The predicate <sup>I</sup>k(u, x) <sup>⇔</sup> <sup>x</sup> = 1∨∃y(Cpyk(λk(u), y)∧<sup>x</sup> <sup>=</sup> ky+1) is an another special case of Copy<sup>k</sup> which is true when x is a sequence of u-bytes, each of which is equal to 1. Then, the minimum power of k of the same u-byte-length as x can be expressed as <sup>y</sup> <sup>=</sup> <sup>Λ</sup>k(u, x) ⇔ ∃<sup>v</sup> (Ik(u, v) <sup>∧</sup> <sup>v</sup> <sup>≤</sup> <sup>x</sup> <sup>∧</sup> <sup>v</sup> <sup>k</sup> u>x <sup>∧</sup> <sup>y</sup> <sup>=</sup> <sup>λ</sup>k(v)).

It is now clear that

$$y = C p y\_k(u, t, x) \Leftrightarrow \lambda\_k(u) < \lambda\_k(x) \land y = 0 \lor \Lambda\_k(u, y) = \Lambda\_k(u, t) \land$$

$$\left(\lambda\_k(u) = \lambda\_k(x) \land C p y\_k(x, y) \lor \lambda\_k(u) > \lambda\_k(x) \land \exists y' \exists y'' \big(\alpha \mid y' \lor \alpha \mid \neg\lambda\_k(y') \land y = y' \lor y''\right)\right).$$

$$C p y\_k(x + \lambda\_k(u), y') \land C p y\_k(\lambda\_k(u), y'') \land \lambda\_k(y') = \lambda\_k(y'') \land y = y' - y''''\Big)\Big|\_{x = y}$$

In this formula, the variables y and y- are introduced in order to supplement every <sup>u</sup>-byte with a sufficient number of leading zeros.

Lemma 6. *The relation* <sup>Δ</sup><sup>k</sup> *is* <sup>∃</sup>*-definable in* <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>k</sup>, <sup>=</sup> *.*

*Proof.* We are going to prove the correctness of the following definition:

<sup>Δ</sup>k(u, t, x) ⇔ ∃z1∃z2∃x1∃x2∃x<sup>3</sup> - <sup>P</sup>k(u) <sup>∧</sup> <sup>k</sup><sup>3</sup> <sup>≤</sup> <sup>u</sup><sup>∧</sup> <sup>z</sup><sup>1</sup> <sup>=</sup> Copyk(u, t, 1) <sup>∧</sup> <sup>λ</sup>k(z1) = <sup>λ</sup>k(x) <sup>∧</sup> <sup>x</sup>&k(ku <sup>−</sup> 1) = 1 <sup>∧</sup> <sup>x</sup> <sup>k</sup> <sup>1</sup>k(z1)<sup>∧</sup> (7)

$$x\_1 = \frac{(kx)}{u} \land x\_2 = \frac{x}{u} \land x\_3 = \frac{x}{ku} \land x = \lambda\_k(x) + x\&\_kx\_1 + x\&\_kx\_2 + x\&\_kx\_3 \land \tag{8}$$

$$\&x\_1 \&\_kx\_2 = 0 \land x\_2 \&\_kx\_3 = 0 \land x\_2 \&\_kx\_3 = 0 \land\tag{9}$$

$$z\_2 = Cpy\_k(u, t, u) \land x \&\_k (z\_2 + \frac{z\_2}{k}) = 0). \tag{10}$$

Conjunction (7) expresses that x is a sequence of the same number of u-bytes as t that starts and ends with the u-byte 000...01, and in every u-byte there can only be zeros and ones. Condition (10) specifies that the two most significant bits in every u-byte of x are equal to zero. Next, the variables x1, x2, x<sup>3</sup> correspond to the right shifts of <sup>x</sup> one <sup>u</sup>-byte plus <sup>D</sup> ∈ {−1, <sup>0</sup>, +1}. Let us prove that in every <sup>u</sup>-byte there is a unique <sup>1</sup> and that it has the same position plus <sup>D</sup> ∈ {−1, <sup>0</sup>, +1} compared to the previous u-byte.

From (8), we see that in every u-byte of x there is at least one 1. Indeed, if <sup>x</sup> <sup>=</sup> <sup>u</sup> then the first <sup>u</sup>-byte of <sup>x</sup>1, or <sup>x</sup>2, or <sup>x</sup><sup>3</sup> must contain <sup>1</sup> (the least significant bit); thus, the second u-byte of x is also non-zero, etc. This 1 in every u-byte is in the desired position since the values x&kx1, x&kx2, x&kx<sup>3</sup> describe the three cases in which the position in the next <sup>u</sup>-byte is the same plus <sup>−</sup>1, <sup>0</sup>, +1, respectively.

Now we prove that there are no other non-zero bits in every u-byte of x. Assume for a contradiction that there is a u-byte in x with more than one 1. Then, there are two consecutive u-bytes (which are depicted on the next page) such that the left u-byte has the only 1, and the right one has at least two 1. This pair exists because the most significant u-byte of x equals 1. From the representation of x in (8), we see that the bits a, b, f, g are all equal to zero. Next, since by (9) x1, x<sup>2</sup> and x<sup>3</sup> are pairwise disjoint, among c, d and e there is only one 1. This contradicts our assumption.


It remains to prove that for every u and x such that Δk(u, t, x) there exist non-negative integers from the definition above. This is obvious for z<sup>1</sup> and z2; the existence of x1, x2, x<sup>3</sup> follows from the fact that there are at least two zeros between every pair of <sup>1</sup> in <sup>x</sup>.

In our proof we check whether or not the u-bytewise minimum of two natural numbers equals zero. In order to express this property, let us introduce a function U<sup>k</sup> which modifies x as follows. If x can be split into consecutive u-bytes where the most significant bit is equal to zero, then Uk(u, x) replaces every non-zero u-byte by 1. Otherwise, this function maps to zero. For example, when x = 10 000 011 000 010 we have U2(100, x) = 1 000 001 000 001 and U2(1000, x)=0.

Lemma 7. The function <sup>U</sup><sup>k</sup> is <sup>∃</sup>-definable in N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>k</sup>, <sup>=</sup>.

Proof. Let us first define a predicate Uk, which (in comparison with the function Uk) is also true for the cases when y has u-bytes equal 1 while the corresponding u-bytes of x are equal to zero. In U<sup>k</sup> there are also no restrictions on the most significant bits of u-bytes. We have the definition

$$\begin{aligned} \overline{U\_k}(u, x, y) \Leftrightarrow \exists t \exists t' \exists v \Big(C p y\_k(\lambda\_k(u), t) \land t' \preceq\_k t \land v = kt' - \frac{(kt')}{u} \land x \prec\_k v \land \Big(\frac{x}{u}\Big) \land \\ &y = v \&\_C Op y\_k(u, x, 1) \Big). \end{aligned}$$

The k-ary representation of v is a sequence of u-bytes which are either zero or equal to ku−1; moreover, for every unit in <sup>x</sup> there is (k−1) in <sup>v</sup>. Then we select the desired 1 in y via a bitwise multiplication of v by a sequence of u-bytes of the same u-byte-length as x, where all bytes are equal to 1.

In order to exclude extra non-zero u-bytes from y, we consider the difference kx <sup>−</sup> <sup>y</sup>. Recall that the definition of <sup>U</sup><sup>k</sup> requires zeroness of the most significant bit in every u-byte. Thus, we have

$$\begin{aligned} y = U\_k(u, x) &\Leftrightarrow x \&\_{k}Copy\_k(u, x, u) > 0 \land y = 0 \lor\\ x \&\_{k}copy\_k(u, x, u) &= 0 \land \overline{U\_k}(u, x, y) \land (k - 1)y \preccurlyeq\_k(kx - y). \end{aligned} \tag{11}$$

Consider the case when the most significant bits in u-bytes of x are all zero. The least significant bit in every u-byte of kx now equals 0, and the fact that there is a unique y that satisfies the definition can be illustrated as follows:

$$\begin{aligned} \underbrace{\ldots\star\ldots\star 1}\_{l\_{k}(u)} & \xrightarrow{l\_{k}(u)} \ldots\star\underbrace{\ldots\star\ldots\star}\_{l\_{k}(u)} \quad \overbrace{0\ldots\cdots\star}\_{l\_{k}(u)}\\ \ldots\underbrace{0\ldots0\cdots0}\_{l\_{k}(u)} & \xrightarrow{l\_{k}(u)} \ldots\underbrace{0\ldots0\cdots1}\_{l\_{k}(u)} \quad \overbrace{0\ldots\cdots0}\_{l\_{k}(u)} \quad \overbrace{1\ldots\cdots0}\_{l\_{k}(u)} \quad \ldots\\ \ldots\underbrace{\star\ldots\star 0(k-1)\ldots(k-1)(k-1)}\_{l\_{k}(u)} & \xrightarrow{\ldots\star}\_{l\_{k}(u)} \quad \overbrace{k-1}^{}(k-1)\ldots(k-1)(k-1)\ldots \end{aligned}$$

These three lines represent the numbers kx, y, and (kx − y), respectively. The left column demonstrates the general "correct" case. The middle and the right columns show why the existence of an extra non-zero u-byte in y contradicts definition (11).

We are now able to prove the main result of this section.

Theorem 3. *For every integer* k ≥ 2 *a relation is* k*-MCM-recognizable if and only if it is* <sup>∃</sup>*-definable in the structure* N; 0, <sup>1</sup>, <sup>+</sup>, &k, <sup>k</sup>, =*. Therefore, every relation* <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> *is r.e. iff it is* <sup>∃</sup>*-definable in this structure.*

*Proof.* For a given <sup>k</sup>-MCM <sup>M</sup> = (m, Q, q0, F, δ) and an input vector <sup>x</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> in k-ary notation, we are going to encode the existence of an accepting sequence of transitions between configurations of M. First choose a variable u such that Pk(u) ∧ i∈[1..n] <sup>k</sup><sup>4</sup>x<sup>i</sup> <sup>≤</sup> <sup>u</sup>; this choice specifies the size of bytes in our encoding.

We multiply by <sup>k</sup><sup>4</sup> since in <sup>u</sup>-byte there must be two bits for delimiters , and at least two auxiliary zeros from the definition of Δk.

A sequence of states is encoded similarly to formula (4), that is,

$$K\_k(u, t, \overline{q}) \implies \bigwedge\_{0 \le i < j \le s} q\_i \&\_k q\_j = 0 \land q\_0 + ... + q\_s = Copy\_k(u, t, 1) \land$$

$$1 \preccurlyeq\_k q\_0 \land \bigvee\_{p \in F} A\_k(u, t) \preccurlyeq\_k q\_{\nu(p)},$$

where q = q0, ..., q<sup>s</sup> and t corresponds to the number of steps of an accepting computation of M. Here we also require q<sup>0</sup> to be the initial state and the most significant u-byte of t corresponds to a final configuration.

We now define a predicate C<sup>M</sup> that encodes a sequence of configurations of M. Similar to Matiyasevich [19], in this definition for every x<sup>i</sup> ∈ x a sequence of copies of x<sup>i</sup> is decomposed into disjoint variables θi,0,...,θi,k−<sup>1</sup> such that every u-byte of θi,a equals Θk,a(xi). Let θ denote the list of variables θ1,0, ..., θ1,k−<sup>1</sup>, θ2,0, ..., θn,k−<sup>1</sup>, θ, θ, where the extra variables θ, θ encode the positions of the delimiters. The variable h stores the positions of the input head of M, and the list of variables y = y1, ..., y<sup>m</sup> corresponds to the values of the counters at each step of computation.

#### 192 M. Starchak

It is convenient to introduce a function bk, which gives the smallest power of k greater than every x<sup>i</sup> ∈ x. The graph of this function can be defined as

$$y = b\_k(\overline{x}) \Leftrightarrow \bigvee\_{i \in [1..n]} y = k\lambda\_k(x\_i) \land \bigwedge\_{i \in [1..n]} y \ge k\lambda\_k(x\_i).$$

This function will be applied to encode the positions of the right delimiter . The following formula describes a sequence of configurations of M.

$$C\_{\mathcal{M}}(u, t, \overline{q}, \overline{x}, \overline{\theta}, h, \overline{y}) \Longrightarrow P\_k(u) \land \bigwedge\_{i \in [1..n]} k^4 x\_i \le u \land u \le t \land K\_k(u, t, \overline{q}) \land$$

$$\theta\_{\vdash} = Ccopy\_k(u, t, 1) \land \bigwedge\_{i \in [1..n]} \left(\theta\_{i, 0} = Ccopy\_k(u, t, k\Theta\_{k, 0}(x\_i + b\_k(\overline{x})) \land \bigwedge\_{a \in [1..k-1]} b\_{\overline{a}}(u, t, \overline{b})\right)$$

$$\bigwedge\_{a \in [1..k-1]} \theta\_{i, a} = Ccopy\_k(u, t, k\Theta\_{k, a}(x\_i)) \land \theta\_{\vdash} = Ccopy\_k(u, t, kb\_k(\overline{x})) \land$$

$$\Delta\_k(u, t, h) \land \bigwedge\_{i \in [1..n]} \Delta\_k(u, t, y\_i).$$

It is easy to see that θ, θ are disjoint with the other variables from θ. For notational convenience, we subsequently assume that <sup>θ</sup>i, <sup>θ</sup> and <sup>θ</sup>i, θ for every i ∈ [1..n], and the letters for the delimiters be the vectors (, ..., ) and (, ..., ) of length n.

We now proceed to the encoding of the fact that a given sequence of configurations is actually a sequence of transitions in M. For a letter (a1, ..., an) ∈ Σ<sup>n</sup> <sup>k</sup> ∪ {, }, a state <sup>p</sup> <sup>∈</sup> <sup>Q</sup>, and a tuple <sup>c</sup> ∈ {0, <sup>1</sup>}<sup>m</sup> such that the values of the counters from Y<sup>c</sup> = {i ∈ [1..m] | c<sup>i</sup> = 0} are equal to zero and from [1..m] \ Y<sup>c</sup> are non-zero, the following formula is an analogue to definition (5):

$$
\begin{split}
\Delta\_{(p,\overline{\mathfrak{a}},\overline{\mathfrak{c}})}(u,t,\overline{q},\overline{\mathfrak{b}},h,\overline{\mathfrak{y}}) & \Longrightarrow \left(q\_{\nu(p)}\&\_{k\_{k}}\;\widecheck{\mathfrak{Q}}\_{k\_{k}}\;U\_{k}(u,\{\theta\_{i,a\_{i}}\&\_{k}h\})\right)\&\_{k} \\ & \qquad \&\limits\_{i\in Y\_{\mathsf{T}}}y\_{i}\&\_{k\_{k}}\;\not\&\mathbb{Z}\_{k\_{k}}\;U\_{k}(u,y\_{i}-Copy\_{k}(u,t,1)\&\_{k}y\_{i})\right)\; \not\vdash\_{k} \\ & \qquad \Big\lvert\_{k} & \qquad \left(\begin{matrix} q\_{\nu(\overline{p})} \\ u \end{matrix}\&\_{k}U\_{k}(u,h\&\_{k}\;\frac{(k^{d}h)}{u})\&\_{k}\;\not\&\mathbb{Z}\_{k}\;U\_{k}(u,y\_{i}\&\_{k}\;\frac{(k^{d\_{i}}y\_{i})}{u})\right).
\end{split}
$$

The key difference with (5) is that now in order to compare two consecutive configurations we shift by one u-byte instead of one bit. It is obvious that the predicate Δ(p,a,c) makes sense when it is complemented with CM. In this case, for example, <sup>U</sup>k(u, h &<sup>k</sup> (kdh) <sup>u</sup> ) highlights the configurations for which in the following configuration the position of the input head shifts by d. Indeed, we obtain a sequence of u-bytes, each of which is equal to one if and only if the position of the unique 1 in the next u-byte is the same plus d, otherwise this u-byte is equal to zero.

It remains to define the relation RL(M) that corresponds to the language recognizable by M. To this end, we have to consider every tuple (p, a, c) in Q × (Σ<sup>n</sup> <sup>k</sup> ∪ {, }) × {0, <sup>1</sup>}<sup>m</sup> and apply already defined predicates <sup>C</sup><sup>M</sup> and <sup>Δ</sup>(p,a,c).

$$\begin{split} R\_{L(\mathcal{M})}(\overline{x}) \Leftrightarrow & \exists u \exists t \exists \overline{q} \exists \overline{\theta} \exists h \exists \overline{y} \Big( C\_{\mathcal{M}}(u, t, \overline{q}, \overline{x}, \overline{\theta}, h, \overline{y}) \land \\ & \bigwedge\_{\{p, \overline{a}, \overline{\tau}\} \in Q \times (\Sigma\_{k}^{n} \cup \{\vdash, \dashv\}) \times \{0, 1\}^{m}} \Delta\_{\{p, \overline{a}, \overline{\tau}\}}(u, t, \overline{q}, \overline{\theta}, h, \overline{y}) \Big). \end{split}$$

This completes the proof. 

Since by [14,19] the bitwise minimum operation &<sup>2</sup> is existentially definable in <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, ·, exp, <sup>=</sup>, we obtain DPR-theorem as a corollary.

Corollary 4 (DPR-theorem). *Every relation* <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> *is r.e. if and only if it is* ∃*-definable in the structure* <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, ·, exp, <sup>=</sup>*.*

Let us fix k = 2 and omit mentioning k in <sup>k</sup> and EqNZBk. Since we have <sup>z</sup> <sup>=</sup> <sup>x</sup>&2<sup>y</sup> <sup>⇔</sup> <sup>z</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>z</sup> (see [14]), bitwise minimum is <sup>∃</sup>-definable in N; 0, 1, +, , -, <sup>=</sup>. Next, exponential diophantiness of follows from the fact that x y iff <sup>y</sup> x <sup>≡</sup> 1(mod 2), where <sup>y</sup> x is a binomial coefficient. Factorial representation of binomial coefficients and Legendre's formula imply that

$$x \prec y \Leftrightarrow s\_2(y) = s\_2(x) + s\_2(y - x),$$

where s2(x) is the number of 1's in base 2 expansion of x. Therefore, the masking relation is definable by the formula <sup>x</sup> <sup>y</sup> <sup>⇔</sup> EqNZB(y, x - (y−x)) and we have the following result.

Corollary 5. *Every relation* <sup>R</sup> <sup>⊆</sup> <sup>N</sup><sup>n</sup> *is r.e. if and only if it is* <sup>∃</sup>*-definable in the structure* N; 0, 1, +,EqNZB, -, =*.*

## 5 Conclusion

The purpose of this paper is to emphasize similarities in existential first-order characterizations of the languages recognizable by various abstract machines. Such descriptions in Sections 3 and 4 allowed us (in some sense) to answer the question of Bès [2, Open Problems] concerning the expressive power of fragments of FO-arithmetic with the predicate EqNZB.

Let us mention one natural question which is related to Theorems 1 and 3. Villemaire proves [23,24] that multiplication is definable in <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, Vk, Vl, <sup>=</sup> when k and l are multiplicatively independent. Bès strengthens this result [1] by showing that the same is true when V<sup>l</sup> is replaced by any l-recognizable relation R<sup>l</sup> that is not definable in <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, <sup>=</sup>. It would be interesting to see whether multiplication is *existentially* definable in <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k, &l, <sup>=</sup>, and more generally, to study ∃-definability in the structures <sup>N</sup>; 0, <sup>1</sup>, <sup>+</sup>, &k, Rl, <sup>=</sup>.

Acknowledgements. The author is grateful to the anonymous reviewers for their useful suggestions and comments.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Coverability in 2-VASS with One Unary Counter is in NP *-*

Filip Mazowiecki<sup>1</sup> , Henry Sinclair-Banks<sup>2</sup>() , and Karol Węgrzycki<sup>3</sup>

<sup>1</sup> University of Warsaw, Warsaw, Poland f.mazowiecki@mimuw.edu.pl <sup>2</sup> Centre for Discrete Mathematics and its Applications (DIMAP) & Department of Computer Science, University of Warwick, Coventry, UK h.sinclair-banks@warwick.ac.uk <sup>3</sup> Saarland University and Max Planck Institute for Informatics, Saarbrücken, Germany wegrzycki@cs.uni-saarland.de

Abstract. Coverability in Petri nets finds applications in verification of safety properties of reactive systems. We study coverability in the equivalent model: Vector Addition Systems with States (VASS).

A k-VASS can be seen as k counters and a finite automaton whose transitions are labelled with k integers. Counter values are updated by adding the respective transition labels. A configuration in this system consists of a state and k counter values. Importantly, the counters are never allowed to take negative values. The coverability problem asks whether one can traverse the k-VASS from the initial configuration to a configuration with at least the counter values of the target.

In a well-established line of work on k-VASS, coverability in 2-VASS is already PSPACE-hard when the integer updates are encoded in binary. This lower bound limits the practicality of applications, so it is natural to focus on restrictions. In this paper we initiate the study of 2-VASS with one unary counter. Here, one counter receives binary encoded updates and the other receives unary encoded updates. Our main result is that coverability in 2-VASS with one unary counter is in NP. This improves upon the inherited state-of-the-art PSPACE upper bound. Our main technical contribution is that one only needs to consider runs in a certain compressed linear form.

Keywords: Vector Addition Systems · Coverability Problem · Linear Path Schemes

## 1 Introduction

Vector Addition Systems with States (VASS) are a well-studied class of infinitestate systems (see the survey [37]). These are finite automata with counters that

<sup>-</sup> Filip Mazowiecki is supported by the ERC grant INFSYS, agreement no. 950398. Henry Sinclair-Banks is supported by EPSRC Standard Research Studentship (DTP), grant EP/T5179X/1. Karol Węgrzycki is supported by the ERC grant TI-PEA agreement no. 850979.

<sup>©</sup> The Author(s) 2023

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. https://doi.org/10.1007/978-3-031-30829-1\_10 196–217, 2023.

can be updated, but are never allowed to take negative values. Thus, a configuration consists of a state and a vector over the natural numbers. The central decision problems are the reachability and coverability problems. The reachability problem asks whether from a given start configuration one can reach the target configuration. The coverability problem is the same except that the target configuration need not be reached exactly, counter values are allowed to be greater. Both problems are not only mathematically elegant, but they have interesting theoretical applications [7] and implementations [6]. Coverability is provably a simpler problem that is better suited for applications; reachability tools are mostly applied to coverability benchmarks [14]. Yet coverability has applications in the verification of safety conditions in reactive systems [17,21]. Such systems may require additional data structures to be accurately represented, like counters for example. Safety conditions often boil down to whether a particular state can be reached as opposed to a particular configuration [8].

Coverability and reachability have been studied for decades. The equivalent model of Petri nets was introduced already in the sixties [34]. For general VASS, Lipton proved in 1976 an EXPSPACE lower bound that applies to both coverability and reachability [31]. Two years later, Rackoff proved a matching EXPSPACE upper bound for coverability [35]. Later in 1981, Mayr proved that reachability is decidable [32] without providing an upper bound for the algorithm. The construction was simplified by Kosaraju [24] and Lambert [25], and a recent series of papers by Leroux and Schmitz ended in 2019 by proving an Ackermann upper bound [27]. A matching Ackermann lower bound was published in 2021 by two independent groups [12,26].

Plenty of attention has been given to VASS with fixed dimension, that is when the number of counters k is invariable, denoted k-VASS. For fixed dimension VASS it matters much whether the counter updates are encoded in unary or binary. Already, Rackoff gives NL and PSPACE upper bounds for coverability in unary encoded and binary encoded k-VASS, respectively [35]. The coverability problem where there are no counters is just directed graph reachability that is NL-complete [3]. Thus, coverability in unary encoded k-VASS is NL-complete, for every fixed k. Coverability in binary encoded 1-VASS is in NC<sup>2</sup> [2], it can therefore be decided in deterministic polynomial time. If there are two or more binary counters, coverability is PSPACE-hard [5] via a reduction from reachability in bounded one-counter automata that is PSPACE-complete [18]. Therefore, coverability in binary encoded k-VASS is PSPACE-complete for every k ≥ 2. See Figure 1 for the complexities of coverability in VASS with a fixed number of unary and binary encoded counters. This is all in striking contrast to the reachability problem in fixed dimension VASS, since reachability in 8-VASS is already known to be nonelementary [13].

There is a prominent line of work on 2-VASS with various encodings. The seminal paper in 1979 of Hopcroft and Pansiot [23] shows reachability in 2-VASS is decidable, proving that the reachability set is effectively semi-linear. Moreover, in the same paper the authors show, by an example, that the 3-VASS reachability set need not be semi-linear. Later, this was improved as it was shown that for


Number of unary counters

Fig. 1. The complexities of coverability in VASS with a fixed number of unary and binary encoded counters. All NL lower bounds arise from the zero counters case, here coverability is directed graph reachability and that is well known to be NL-complete [3]. In the case of one binary counter, regardless of the number of unary counters, we are aware only of this trivial NL lower bound. Furthermore, with one binary counter and at least two unary counters, we are not aware of a non-trivial upper bound (denoted "Open" in the table). When there are at least two binary counters and any number of unary counters, coverability is PSPACE-complete. The lower bound holds for 2-VASS with two binary counters [5] and the upper bound is given by Rackoff for any fixed dimension [35]. Recall that coverability in general VASS, where the number of counters is not fixed, is EXPSPACE-complete [35].

2-VASS the reachability relation is effectively semi-linear [28]. This proof shows that every 2-VASS can be characterised by a *flat model*, i.e. where the underlying finite automaton does not contain nested cycles. A more careful analysis of that paper, resulted in a PSPACE upper bound result for reachability in binary encoded 2-VASS [5]. Since coverability in binary encoded 2-VASS is PSPACEhard [5], the authors were able to conclude that both coverability and reachability are PSPACE-complete. Just as coverability demonstrated the difference encoding makes to complexity, so does reachability; later it was proved that reachability in unary encoded 2-VASS is NL-complete [16].

*Our Results and Techniques.* We consider the coverability problem for 2-VASS with one unary counter. Here, updates of one counter are encoded in binary and the updates of the other are encoded in unary, see Figure 2 for an example. Notice that the unary counter need not be limited to polynomially bounded values. Otherwise, the value of the unary counter could be encoded into the states for an instance of coverability in binary encoded 1-VASS. Furthermore, we do not impose any restrictions on the initial and the target configurations, i.e. both coordinates of these vectors are encoded in binary. Our main result is that coverability in 2-VASS with one unary counter is in NP.

Coverability in binary encoded k-VASS is PSPACE-complete, for k <sup>≥</sup> <sup>2</sup>. The lower bound limits the practicality of applications. Therefore, it is sensible to consider restricted variations and quantify their complexity. We remark that coverability in fixed dimension VASS had widely-open complexity if there was exactly one binary counter and at least one unary counter. See Figure 1 for a summary of the known results.

Fig. 2. Example 2-VASS with one unary counter V . Consider the instance of coverability consisting of V , the initial configuration q(0, 1), and the target configuration q(0, 10). Consider the path π = λρ λρ ··· λρ ρ ··· ρ which induces a run in V from the initial configuration q(0, 1). There are 990 repetitions of the pair of cycles λρ to witness the configuration q(990, 1). The cycles alternate so both counters remain non-negative throughout the run. This is followed by 10 iterations of the cycle ρ so the configuration q(0, 11) is witnessed, achieving coverability of the target configuration q(0, 10).

The natural starting point is the characterisation of runs via *linear path schemes* [4]. Intuitively, the authors prove that if coverability or reachability holds then there is a witnessing path of a specific shape. Namely, all paths can be characterised by a bounded language defined by a regular expression of the form τ0γ<sup>∗</sup> <sup>1</sup> τ<sup>1</sup> ...τ<sup>k</sup>−<sup>1</sup>γ<sup>∗</sup> <sup>k</sup>τk. Here τ0,...,τ<sup>k</sup> are paths that connect disjoint cycles γ1,...,γk. Since the language is bounded, checking if there is a path for a given expression essentially amounts to an instance of integer linear programming. In particular, the authors argue that both k and |τ0| + |γ1| + |τ1| + ... + |τ<sup>k</sup>−<sup>1</sup>| + |γk| + |τk| are pseudo-polynomially bounded [4]. However, a polynomial bound would immediately yield an NP upper bound as such a regular expression can be guessed. Given that coverability in 2-VASS with two binary counters is PSPACEhard [5], we cannot simply directly apply the known results when dealing with 2-VASS with one binary and one unary counter. In Section 3, we provide a detailed discussion and a difficult yet motivating example in Figure 3.

To overcome this problem, we show that coverability can be witnessed by paths in *compressed linear form*. We relax the condition of the bounded language, by allowing to nest linear forms, provided that the exponents are fixed. Intuitively, an expression of the form (τ γ∗τ )<sup>∗</sup> is still forbidden, but we allow for (τ γ<sup>e</sup>τ )∗, where e is fixed but can be exponentially large (encoded using polynomially many bits). Such a form easily provides an NP upper bound.

We rely on two crucial observations to prove that we can focus on paths in compressed linear form. First, notice that the ∗ operation in a linear path scheme corresponds to iterating some cycle in the VASS. Since γ1,...,γ<sup>k</sup> need to be short, one naturally focuses on short cycles. The issue is that there are exponentially many cycles of polynomial size. In Section 4 we prove that for coverability there are only polynomially many 'optimal' cycles. In Section 5 we deal with the problem when some cycle γ occurs many times in a linear path scheme witnessing coverability, resulting in a polynomial bound on k, the width of the linear path scheme. Then we prove that, either we can merge some γ<sup>i</sup> and γ<sup>j</sup> thus reducing the width, or that there is a cycle that has positive effect on one counter and non-negative effect on the other counter. Intuitively, in the latter case, we can reduce the problem to coverability in 1-VASS by pumping such a cycle that forces one counter to take an arbitrarily large value. Moreover, such a cycle is witnessed by a linear path scheme. Since we need to pump this cycle, we require compressed linear forms to describe the repetitions of the cycle.

We highlight that both our crucial observations rely on that we work with coverability, not reachability. We further highlight that we address these crucial observations through our technical contributions that often depend on the fact there is one unary counter.

*Further Related Work.* Asymmetric treatment of the counters has been already considered for VASS. Recall that Minsky machines can be seen as VASS with the additional ability of zero-testing. For this model coverability is undecidable [33], even with two counters. This raised natural questions of what happens where only one of the counters is able to be reset or tested for zero. This, and more generally, reachability in VASS with hierarchical zero-tests are known to be decidable [36]. There is a further investigation into VASS with one zero-test [20]. Recently, work has appeared containing detailed analysis about 2-VASS where counters have different powers [19,29]. Finally, one of the most famous open problems in the community is whether reachability is decidable for 1-VASS with a pushdown stack. For these systems, coverability is known to be decidable [30]. The best known lower bound is that coverability, thus reachability also, is PSPACE-hard [15]. Our model, 2-VASS with one unary counter, can be seen as 1-VASS with a singleton alphabet pushdown stack.

The complexity of reachability in binary encoded 3-VASS remains an intriguing open problem. It is PSPACE-hard, like in dimension two, and the only known upper bound is primitive recursive, but not even elementary [27]. Recent works on reachability in fixed dimension VASS [11,9,13] provide new examples and a better understanding of the VASS model. Interestingly, many techniques applied to fixed dimension VASS are very closely related to recent progress on the nonelementary and Ackermann lower bounds for general VASS [10,12,26]. We finally and additionally motivate coverability in VASS with one binary counter and (at least) one unary counter as an avenue for finding new techniques to approach VASS problems with.

## 2 Preliminaries

Given an integer z <sup>∈</sup> <sup>Z</sup> we denote bitsize(z) = log2(|z<sup>|</sup> + 1) + 1. For a vector **<sup>v</sup>** := (v<sup>1</sup>, v<sup>2</sup>) we use (**v**)<sup>1</sup> := <sup>v</sup><sup>1</sup> and (**v**)<sup>2</sup> := <sup>v</sup><sup>2</sup> to be the projections to the first and second coordinates, respectively. We use <sup>|</sup>**v**|max := max{|v<sup>1</sup>|, <sup>|</sup>v<sup>2</sup>|} + 1 to denote the size of vector **v**. We write **v** ≤ **w** if the inequalities hold on each coordinate. We write **<sup>v</sup>** < **<sup>w</sup>** if at least one of the inequalities is strict.

<sup>A</sup> *2-VASS with one unary counter* V = (Q, T) consists of a finite set of control *states* Q and a set of *transitions* T <sup>⊆</sup> Q <sup>×</sup> <sup>Z</sup> × {−1, <sup>0</sup>, <sup>1</sup>} × Q. We shall refer to the first counter as the *binary counter* and the second counter as the *unary counter*. The size of V is <sup>|</sup>V <sup>|</sup> <sup>=</sup> <sup>|</sup>Q<sup>|</sup> <sup>+</sup> - (p,b,u,q)∈<sup>T</sup> bitsize(b). With |V |max := |Q| + |T|·|T|max we denote the total 'pseudo-polynomial size' of the automaton, where |T|max denotes the maximum absolute value that occurs in the transitions. Note that in a standard 2-VASS both counters are in binary, i.e. the domain of updates for the second counter is also Z.

A *path* π in V is a, possibly empty, sequence of transitions π = (ti)<sup>m</sup> <sup>i</sup>=1 such that t<sup>i</sup> = (q<sup>i</sup>−<sup>1</sup>, bi, ui, qi) ∈ T. A path is *simple* if q0,...,q<sup>m</sup> are distinct. A path is a *cycle* if q<sup>0</sup> = q<sup>m</sup> and m > 0 (thus empty cycles are forbidden). We call it a q0-cycle to emphasise the first and last state of the cycle. A cycle is *simple* if q1,...,q<sup>m</sup> are distinct. A cycle is *short* if m ≤ |Q|. The *length* of a path is the number of transitions in the path, denoted len(π) = m. We write π[i..j] to denote the path that is the subsequence of transitions (ti,...,t<sup>j</sup> ) in π.

<sup>A</sup> *configuration* (p, **<sup>u</sup>**) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>N</sup><sup>2</sup>, denoted <sup>p</sup>(**u**), is a state paired with the current binary and unary counter values. A *run* is a sequence of configurations (qi(**v**i))<sup>m</sup> <sup>i</sup>=0 such that (q<sup>i</sup>−<sup>1</sup>,(**v**i)<sup>1</sup> − (**v**<sup>i</sup>−<sup>1</sup>)1,(**v**i)<sup>2</sup> − (**v**<sup>i</sup>−<sup>1</sup>)2, qi) ∈ T. A run can equivalently be defined by the sequence of configurations induced by following a path π starting from an initial configuration q0(**v**0). We denote this run q0(**v**0) <sup>π</sup> −→ qm(**v**m). We also write q0(**v**0) <sup>∗</sup> −→ qm(**v**m) to indicate the existence of a run between two configurations.

In this paper we study the *coverability* problem for VASS.

#### VASS Coverability

INPUT: A VASS V = (Q, T) and two configurations p(**u**) and q(**v**). QUESTION: Does p(**u**) <sup>∗</sup> −→ q(**v** ) hold, for some **v** ≥ **v**?

Do note that the initial configuration p(**u**) and the target configuration q(**v**) have both the binary and unary components encoded as binary integers. The *reachability problem* for VASS—which we will not study in this paper—requires **v** = **v**.

Consider a path π = (ti)<sup>m</sup> <sup>i</sup>=1, where t<sup>i</sup> = (q<sup>i</sup>−<sup>1</sup>, bi, ui, qi). The *effect* of π is the sum of the counter updates, i.e. the vector eff(π) := m <sup>i</sup>=1(bi, ui). We often focus on the two projections: the *binary effect* effb(π) := m <sup>i</sup>=1 bi, and the *unary effect* effu(π) := m <sup>i</sup>=1 ui.

We say that a cycle γ is *monotone* if eff(γ) ≥ **0** or eff(γ) ≤ **0**. Otherwise, we say that γ is *non-monotone*. Note the two variants of a non-monotone cycle: a *positive-negative* cycle effb(γ) > 0 and effu(γ) < 0, and a *negative-positive* cycle effb(γ) < 0 and effu(γ) > 0.

Let <sup>γ</sup> be a cycle. Given <sup>e</sup> <sup>∈</sup> <sup>N</sup> we write <sup>γ</sup><sup>e</sup> for the path obtained by <sup>e</sup> repetitions of γ. We refer to e as the *exponent*. A linear path scheme is a regular expression of the form τ0γ<sup>∗</sup> <sup>1</sup> τ<sup>1</sup> ··· τ<sup>k</sup>−<sup>1</sup>γ<sup>∗</sup> <sup>k</sup>τk, where the paths τ0, τ1,...,τ<sup>k</sup> connect disjoint cycles γ1,...,γk. Note that a collection of cycles is disjoint if no two cycles have a common state. Given = (τ0, γ1, τ1,...,τ<sup>k</sup>−<sup>1</sup>, γk, τk), we say the a path π is in linear form if π = π = τ0γ<sup>e</sup><sup>1</sup> <sup>1</sup> <sup>τ</sup><sup>1</sup> ··· <sup>τ</sup><sup>k</sup>−<sup>1</sup>γ<sup>e</sup><sup>k</sup> <sup>k</sup> τ<sup>k</sup> for some exponents e1,...,ek. Note that in this definition every path has a linear form, e.g. τ<sup>0</sup> = π is valid. To leverage the definition, we will ask whether paths are in a linear form of certain size. The size of a linear form is k <sup>i</sup>=0 len(τi) + k <sup>i</sup>=1 len(γi). The size of π is k <sup>i</sup>=0 len(τi) + k <sup>i</sup>=1 len(γi) + k <sup>i</sup>=1 bitsize(ei), i.e. includes the exponents. We refer to k as the *width* of the linear form.

## 3 Coverability in 2-VASS with One Unary Counter

In this section we briefly discuss why the state-of-the-art techniques are not enough to prove that coverability in 2-VASS with one unary counter is in NP. Blondin et al. [4] show that for a given 2-VASS V there exists a set of linear path schemes S such that if p(**u**) <sup>∗</sup> −→ q(**v**) in V , then there exists a path π in a linear path scheme ρ <sup>∈</sup> S such that p(**u**) <sup>π</sup> −→ q(**v**). For every linear path scheme ρ <sup>∈</sup> S the width of ρ, and therefore the width of every path, is bounded above by poly(|Q|, <sup>|</sup>T|max) [4, Theorem 3.1]. Such a path <sup>π</sup> is not necessarily a polynomial size witness, as the width depends on <sup>|</sup>T|max polynomially. We provide an example of a 2-VASS with one unary counter where the width of every linear form for a path is exponential in the input size. This demonstrates that the combinatorial structure of linear path schemes is not self-sufficient to show that there always exists a polynomial size witness of coverability.

Fig. 3. Example 2-VASS with one unary counter V , where N = 2<sup>n</sup>, where n is an input parameter (thus making N exponentially large). Consider the coverability instance with the initial configuration q(0, 1), and the target configuration q(N, 1). Let λ = tqaα<sup>N</sup><sup>2</sup> tabβ<sup>N</sup><sup>2</sup> tbq and ρ = tqptpcγ<sup>N</sup><sup>2</sup> tcdδ<sup>N</sup><sup>2</sup> tdq, where txy is the transition from state x to state <sup>y</sup>. Observe that eff(λ)=(N, <sup>−</sup>1) and eff(ρ)=(−<sup>N</sup> +1, 1), thus eff(λρ) = (1, 0). It is easy to then see that <sup>q</sup>(0, 1) (λρ)<sup>N</sup> −−−−→ <sup>q</sup>(N, 1). Intuitively the cycles <sup>λ</sup> and <sup>ρ</sup> alternate so both counters remain non-negative throughout the run. In the appendix, we prove that there does not exist a linear form of polynomial size for a path that induces a coverability run.

Paths in Compressed Linear Form. Nevertheless, there is a natural way to succinctly describe the path presented in Figure 3. Let σ = λρ, and note that

$$
\sigma^N = \left( t\_{qa} \,\,\alpha^{N^2} \,\, t\_{ab} \,\,\beta^{N^2} \,\, t\_{bq} t\_{qp} t\_{pc} \,\,\gamma^{N^2} \,\, t\_{cd} \,\,\delta^{N^2} \,\, t\_{dq} \right)^N \,\,.
$$

All paths and cycles are 'small', and the bitsize of N and N<sup>2</sup> are polynomial in n, so σ itself is a path in linear form. We introduce the following generalisation of linear form paths that encapsulates the idea behind paths of this kind of arrangement.

Definition 1 (Compressed linear form path). A path <sup>π</sup> is in compressed linear form if π = ρ0σ<sup>f</sup><sup>1</sup> <sup>1</sup> <sup>ρ</sup><sup>1</sup> ··· <sup>ρ</sup><sup>k</sup>−<sup>1</sup>σ<sup>f</sup><sup>k</sup> <sup>k</sup> ρ<sup>k</sup> for some connected paths in linear form ρ0, ρ1,...,ρk; cycles in linear form σ1,...,σk; and exponents f1,...,fk. The size of a compressed linear form path is the sum of the sizes of all ρ<sup>i</sup> and σ<sup>i</sup> (including the bitsize of their exponents) plus the bitsize of the exponents fi.

Fig. 4. A compressed linear form path.

The following theorem is our main contribution.

Theorem 1. Let <sup>V</sup> be a 2-VASS with one unary counter and fix two configurations p(**u**) and q(**v**). If p(**u**) <sup>∗</sup> −→ q(**v**), then there exists a path in compressed linear form π such that p(**u**) <sup>π</sup> −→ q(**v** ) and **v** ≥ **v**. The size of the compressed linear form path is polynomial in |V | + bitsize(**u**) + bitsize(**v**).

Corollary 1. Coverability in 2-VASS with one unary counter is in NP.

Proof. By Theorem 1 it suffices to consider paths in compressed linear form of polynomial size, that can be guessed in NP. It suffices to observe that a coverability instance on a given compressed linear form amounts to an instance of integer linear programming. Intuitively, this is because the nested cycles are fixed. Thus to check whether a run drops below zero it suffices to check before applying a cycle and after applying it for the last time (see e.g. [5, Section V, Lemma 14]).

We highlight that it is rather unexpected that only one extra 'level' of linear form paths is enough to obtain polynomial size witnesses of coverability in a 2- VASS with one unary counter, since the problem is PSPACE-complete for general 2-VASS. Roughly speaking, the example given in Figure 3 observes the most complex behaviour possible and this instance of coverability is witnessed by a compressed linear form path. More specifically, compressed linear form paths containing only one linear form cycle suffice as witnesses for coverability in 2- VASS with one unary counter. Therefore, all witnesses can be represented by a compressed linear form path ρσ<sup>N</sup> τ where ρ and τ are linear form paths to and from the single linear form cycle σ which is iterated N times.

The rest of the paper is dedicated to proving Theorem 1. We heavily exploit both distinguishing features of the problem: the fact that one counter receives unary encoded updates (as opposed to both counters in binary) and the fact that we aim to assert coverability (as opposed to reachability). Our approach is as presented in the introduction. In 4 we observe that we can polynomially bound the total number of distinct short cycles. We formalise this and show that there are only polynomially many 'irreplaceable' short cycles. In 5 we provide a 'reshuffling procedure'. If some short cycle γ repeats exponentially many times we aim to modify the path π by moving the cycles γ close to each other. Then either every short cycle γ will appear only in polynomially many 'bundles' γe, or we find a cycle σ such that eff(σ) > **0**. In the latter case, by pumping σ we are essentially left with one counter. Finally, in Section 6 we conclude the proof of Theorem 1.

# 4 Replacing Short Cycles

In this section, we show that there are only polynomially many short cycles that need occur in a run witnessing coverability. Fix a path <sup>π</sup> = (qi−<sup>1</sup>, bi, ui, qi)<sup>k</sup> i=1. Let <sup>0</sup> <sup>≤</sup> <sup>i</sup>b, iu <sup>≤</sup> <sup>k</sup> be the first indices such that <sup>g</sup>b <sup>=</sup> ib i=1 <sup>b</sup><sup>i</sup> and <sup>g</sup><sup>u</sup> <sup>=</sup> iu i=1 <sup>u</sup><sup>i</sup> are at their lowest, respectively. Note that <sup>g</sup>b, gu <sup>≤</sup> <sup>0</sup> since by convention if we consider <sup>i</sup>b, iu = 0 then the sum evaluates to <sup>0</sup>. We call and denote these two numbers the *binary guard* grdb(π) = <sup>g</sup><sup>b</sup> and the *unary guard* grdu(π) = <sup>g</sup>u. The following claim immediately follows from these definitions.

*Claim 1.* Both grdb(π[i<sup>b</sup> + 1..k]) = 0 and grdu(π[i<sup>u</sup> + 1..k]) = 0.

Much like the *nadir* of a cycle in a one-counter net, defined in [1], we define the *binary-nadir state* as <sup>q</sup>i<sup>b</sup> , i.e. the first state in which the binary counter first attains the lowest value when executing π. We call the *binary-nadir decomposition* π = π<sup>b</sup> 1πb <sup>2</sup>, for π<sup>b</sup> <sup>1</sup> <sup>=</sup> <sup>π</sup>[1..ib] and <sup>π</sup><sup>b</sup> <sup>2</sup> <sup>=</sup> <sup>π</sup>[i<sup>b</sup> + 1..k], as intimated in Claim 1. Notice that this decomposition necessitates the binary guard of the path π is equal to the binary effect of the prefix π<sup>b</sup> <sup>1</sup>, grdb(π) = effb(π<sup>b</sup> <sup>1</sup>) = grdb(π<sup>b</sup> <sup>1</sup>). Furthermore, the suffix of the binary-nadir decomposition has zero binary guard grdb(πb <sup>2</sup>)=0. We primarily utilise binary-nadir states and binary-nadir decompositions, hence the omission of matching unary-nadir states and unary-nadir-decompositions.

Definition 2 (Replaceable cycles). *Let* γ *be a* q*-cycle and let* p *be the binarynadir state of* γ*. We say that* γ *is replaceable if there exists a* q*-cycle* γ *with the same binary-nadir state* p*, such that*


*Additionally, at least one inequality is strict and we write* <sup>γ</sup> <sup>≺</sup> <sup>γ</sup>- *.*

We say a cycle is *irreplaceable* if it is not replaceable. We also say that an irreplaceable <sup>q</sup>-cycle <sup>γ</sup> with the binary-nadir state <sup>p</sup> is *characterised* by the five values: effb(γ), effu(γ), grdb(γ), grdu(γ), and len(γ).

Lemma 1 (Replacing cycles). *Let* <sup>π</sup> <sup>=</sup> <sup>π</sup>1γπ<sup>2</sup>*, where* <sup>γ</sup> *is a* <sup>q</sup>*-cycle. Suppose* p(**u**) <sup>π</sup> −→ <sup>q</sup>(**v**) *then the following hold.*


*In both cases* **<sup>v</sup>**- <sup>≥</sup> **<sup>v</sup>** *and* len(π) <sup>≥</sup> len(π1γ- π2)*.*

For convenience, we define the polynomial R(|Q|) := |Q| <sup>4</sup>(|Q|+ 1)(2|Q|+ 1)<sup>2</sup>.

Lemma 2. *There exists at most* <sup>R</sup>(|Q|) *many irreplaceable short cycles with different characterisations.*

*Proof.* We fix two states <sup>q</sup> and <sup>p</sup> and consider only <sup>q</sup>-cycles <sup>γ</sup> with the binarynadir state p. Thus in the final argument one must multiply everything by |Q| 2. Since we consider short cycles, the unary effect and the unary guard are small, i.e. −|Q| ≤ effu(γ) ≤ |Q| and −|Q| ≤ grdu(γ) ≤ 0.

Towards a contradiction, suppose there exists more than |Q| <sup>2</sup>(|Q|+ 1)(2|Q|<sup>+</sup> 1)<sup>2</sup> many such irreplaceable q-cycles with different characterisations. By the pigeonhole principle there must exist two cycles, denoted in binary-nadir decomposition γ = γ1γ<sup>2</sup> and γ- = γ- 1γ- <sup>2</sup>, that have the same values effu(γ1) = effu(γ- 1), effu(γ2) = effu(γ- <sup>2</sup>), grdu(γ) = grdu(γ- ), len(γ1) = len(γ- <sup>1</sup>), and len(γ2) = len(γ- 2).

We know that the irreplaceable q-cycles γ and γ have different characterisations, so it must be the case that their binary effects differ effb(γ) = effb(γ- ). Otherwise, the cycle with the lesser binary guard is replaceable, because the unary effect, unary guard, and length do not differ. Without loss of generality, suppose effb(γ) > effb(γ- ), then grdb(γ) < grdb(γ- ). Otherwise, γ would be replaceable as γ ≺ γ- .

Now consider the q-cycle σ = γ- <sup>1</sup>γ2, also with the binary-nadir state p. We will show that γ ≺ σ contradicting the fact that γ is an irreplaceable q-cycle. First, observe that σ has greater binary effect than γ as

$$\text{eff}\_b(\sigma) = \text{eff}\_b(\gamma\_1') + \text{eff}\_b(\gamma\_2) > \text{eff}\_b(\gamma\_1) + \text{eff}\_b(\gamma\_2) = \text{eff}\_b(\gamma),$$

where the inequality holds because grdb(γ) < grdb(γ- ). Second, σ and γ have equal unary effect because effu(γ- <sup>1</sup>) = effu(γ1). Third, we show that σ has a greater binary guard than γ. Since γ<sup>2</sup> is the suffix of the binary-nadir decomposition of γ, it must be true that grdb(γ2)=0. By Claim 1 grdb(σ) = grdb(γ- 1). Combining these facts, grdb(σ) = grdb(γ- ) > grdb(γ). Fourth, σ has at least the unary guard of γ because, in particular, the unary guard of the prefix of a path is at most the unary guard of the entire path.

$$\begin{aligned} \text{grd}\_u(\sigma) &= \min \{ \text{grd}\_u(\gamma\_1'), \text{eff}\_u(\gamma\_1') + \text{grd}\_u(\gamma\_2) \} \\ &\ge \min \{ \text{grd}\_u(\gamma'), \text{eff}\_u(\gamma\_1') + \text{grd}\_u(\gamma\_2) \} \\ &= \min \{ \text{grd}\_u(\gamma), \text{eff}\_u(\gamma\_1) + \text{grd}\_u(\gamma\_2) \} = \text{grd}\_u(\gamma). \end{aligned}$$

Fifth and finally, σ and γ have equal length because len(γ- <sup>1</sup>) = len(γ1). We have at least one strict inequality. Thus, we have reached the desired contradiction.

## 5 Reshuffling Linear Form Paths

## 5.1 Reshuffling Procedure

There can be many linear forms for a path π. We will try to find an 'optimal' one, so we introduce a cost function to quantify linear forms. Recall that a linear form is a sequence of paths τ0, τ1,...,τ<sup>k</sup> and a sequence of cycles γ1,...,γk. If π is in the linear form = (τ0, γ1, τ1,...,τ<sup>k</sup>−<sup>1</sup>, γk, τk) then we write π- = <sup>τ</sup>0γ<sup>e</sup><sup>1</sup> <sup>1</sup> <sup>τ</sup><sup>1</sup> ··· <sup>τ</sup><sup>k</sup>−<sup>1</sup>γ<sup>e</sup><sup>k</sup> <sup>k</sup> τk, where π = π- (the index is here to stress the exact linear form). For this section, we will consider linear forms only containing short cycles γ, they will play a key role in the following arguments.

We define a cost function that assigns, to a linear form , the following pair of naturals C() := <sup>k</sup> <sup>i</sup>=0 len(τi), k . For convenience, we define the polynomial P(|Q|) := 2(|Q| <sup>2</sup> + 1)(|Q<sup>|</sup> <sup>2</sup> + 2) · <sup>R</sup>(|Q|), where <sup>R</sup> is the polynomial defined for Lemma 2. We say that a linear form is *narrow* if C() ≤ (|Q|(P(|Q|) + 1), P(|Q|)), otherwise we say that is *wide*. We say that the triple (π- , σ, π--) is a monotone cycle decomposition of a path π if σ is a monotone cycle, π = π- σπ--, and len(σ) < len(π).

Lemma 3 (Reshuffling). *Let* <sup>π</sup> *be a path such that* <sup>p</sup>(**u**) <sup>π</sup> −→ q(**v**)*. Then there exists a path* <sup>ρ</sup> *such that* <sup>p</sup>(**u**) <sup>ρ</sup> −→ q(**w**) *where* **w** ≥ **v***,* len(ρ) ≤ len(π)*, and either*


*Proof.* We start with a series of preparations. In the early part of this proof, we provide simple observations to ascertain some auspicious properties of our path. In the later part of this proof, we present the 'reshuffling procedure' and conclude with one of the cases in the statement of this lemma. In this proof we will compare linear forms using the lexicographic order ≺lex, that is known to be a linear-order and a well-order. Formally,

$$\begin{aligned} C(\ell') \prec\_{lex} C(\ell) &\iff (C(\ell'))\_1 < (C(\ell))\_1 \text{ or,} \\ (C(\ell'))\_1 &= (C(\ell))\_1 \text{ and } (C(\ell'))\_2 < (C(\ell))\_2. \end{aligned}$$

We start with a path π such that <sup>p</sup>(**u**) <sup>π</sup>- −→ <sup>q</sup>(**v**- ) where **v**- <sup>≥</sup> **v**, len(π- ) ≤ len(π), and π has a linear form that has the least cost among all linear forms for all like-paths. That means there does not exist another path π- such that <sup>p</sup>(**u**) <sup>π</sup>-- −−→ <sup>q</sup>(**v**--) where **v**-- <sup>≥</sup> **v**, len(π--) ≤ len(π), and π- has a linear form -- such that C(--) ≺lex C(- ).

For the first observation, suppose there exists 0 ≤ i ≤ k such that len(τi) > |Q|. Then the path τ<sup>i</sup> can be written as τ<sup>i</sup> = τ - γτ --, where γ is a short cycle. We can define the linear form - by modifying where τ<sup>i</sup> is swapped for τ - γτ --. Although this increments the number of cycles k, we decrease the total length of the paths as len(τ - ) + len(τ --) < len(τi) (recall that empty cycles are forbidden). Thus C(--) ≺lex C(- ) contradicting the assumption that has minimum cost. Therefore, we assume that len(τi) ≤ |Q| for all 0 ≤ i ≤ k.

For the second observation, we define <sup>U</sup> := {<sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup> : (**v**<sup>i</sup>)<sup>2</sup> <sup>&</sup>lt; <sup>|</sup>Q|} to be the set of indices of configurations in the run that have unary counter value less than |Q|. Observe that if |U| > |Q| <sup>2</sup>+ 1 then there are two indices <sup>0</sup> <i<j <sup>≤</sup> <sup>m</sup> such that the two corresponding configurations in the run have matching states <sup>q</sup><sup>i</sup> <sup>=</sup> <sup>q</sup><sup>j</sup> and equal unary counter values (**v**<sup>i</sup>)<sup>2</sup> = (**v**<sup>j</sup> )2. Then, regardless of sign of its binary effect, π- [i..j] is a monotone cycle. Here, case (ii) immediately holds by decomposing π itself using the monotone cycle π- [i..j], given that i > 0 and j ≤ m implies len(π- [i..j]) = j − i<m = len(π- ). Therefore, we assume |U|≤|Q| <sup>2</sup> + 1. We continue with the aim of satisfying the conditions of case (ii) by finding a monotone cycle decomposition.

Let d = |{γ1,...,γk}| be the number of distinct cycles in the linear form - . By Lemma 1 and Lemma 2, we can assume that d ≤ R(|Q|). Otherwise, we can exchange replaceable q-cycles for irreplaceable q-cycles using the first point in Lemma 1. It is possible that for a particular characterisation, we can observe more than one irreplaceable q-cycle. Then using the second point in Lemma 1, we can arbitrarily select one of these irreplaceable q-cycles with equal characterisations to exchange all others with. By applying these cycle replacements to π- , we obtain a different path ρ. Definition 2 ensures that we do so without decreasing the effect (a), without allowing the counters to take a negative value (b), and without increasing the length of the path (c). Therefore <sup>p</sup>(**u**) <sup>ρ</sup> −→ <sup>q</sup>(**w**) and **w** <sup>≥</sup> **v**- <sup>≥</sup> **v**, and len(ρ) <sup>≤</sup> len(π- ) ≤ len(π). We remark since cycles have been exchanged one-for-one, then ρ takes a linear form with the same path segments as - . Therefore, it is clear that neither the number of cycles k, nor the sum of the lengths of the paths between cycles, have changed. We also know that is a linear form for ρ with minimum cost C() = C(- ), as per the initialisation in this proof.

Suppose ρ = ρ = τ0γ<sup>e</sup><sup>1</sup> <sup>1</sup> <sup>τ</sup><sup>1</sup> ··· <sup>τ</sup><sup>k</sup>−<sup>1</sup>γ<sup>e</sup><sup>k</sup> <sup>k</sup> <sup>τ</sup>k. Let (q<sup>j</sup> (**v**<sup>j</sup> ))<sup>m</sup> <sup>j</sup>=0 be the run obtained by following the path <sup>ρ</sup> from the initial configuration <sup>q</sup>0(**v**<sup>0</sup>) = <sup>p</sup>(**u**) to the final configuration <sup>q</sup>m(**v**<sup>m</sup>) = <sup>q</sup>(**w**). We may assume that is wide. Otherwise, case (i) is immediately satisfied. We also know that len(ρ) ≥ max{(C())1,(C())2} > P(|Q|). We may also assume that each cycle γ1,...,γ<sup>k</sup> is non-monotone, i.e. it is positive-negative or negative-positive. Otherwise, case (ii) immediately holds by decomposing ρ itself using some monotone cycle γi, given that len(γi) ≤ |Q| < P(|Q|) < len(ρ-). Notice this is valid since each e<sup>i</sup> > 0 by the minimality of C(), otherwise you can write ··· <sup>τ</sup><sup>i</sup>−<sup>1</sup>γ<sup>0</sup> <sup>i</sup> τ<sup>i</sup> ··· with one less cycle, decreasing (C())2.

From the first observation, we get k <sup>i</sup>=0 len(τi) ≤ (k + 1)|Q|. Given that is wide, either |Q|(P(|Q|) + 1) < (C(- ))<sup>1</sup> = k <sup>i</sup>=0 len(τi) ≤ (k + 1)|Q| that implies P(|Q|) < k, or P(|Q|) < (C(- ))<sup>2</sup> = k. Regardless, P(|Q|) < k holds. Recall that |U|≤|Q| <sup>2</sup> + 1 from the second observation. Since there are relatively 'few' configurations indexed by U, there must exist a relatively 'distant' pair of consecutive configurations indexed by U. More formally, there are i and j such that 0 ≤ i<j ≤ k and j − i ≥ 2(|Q| <sup>2</sup> + 2)R(|Q|) and all configurations that occur in the run over the path segment τiγ<sup>e</sup>i+1 <sup>i</sup>+1 ··· <sup>γ</sup><sup>e</sup><sup>j</sup> <sup>j</sup> τ<sup>j</sup> have unary counter value at least |Q|. Notice that j −i is the number of cycles in this path segment. Since j − i ≥ 2(|Q| <sup>2</sup> + 2)R(|Q|) and by pigeonhole principle on the number of irreplaceable cycles, there is a common irreplaceable cycle γ repeated at least x = 2(|Q| <sup>2</sup> + 2) many times. We will focus on the first x such occurrences of this cycle. Let s1,...,s<sup>x</sup> be the indices of this cycle γ, i.e. γ = γ<sup>s</sup><sup>1</sup> = ... = γ<sup>s</sup><sup>x</sup> . To highlight these cycles, we decompose this path segment into

$$
\tau\_i \gamma\_{i+1}^{e\_{i+1}} \cdot \cdots \gamma\_j^{e\_j} \tau\_j = A\_0 \gamma^{f\_1} A\_1 \cdots A\_{x-1} \gamma^{f\_x} A\_x,
$$

where f<sup>j</sup> := e<sup>s</sup><sup>j</sup> and Λ<sup>j</sup> are the concatenated paths (and cycles) in between iterations of γ, see Figure 5. To reiterate, we know that all configurations that occur in the run over this path segment have at least |Q| unary counter value and γ is a short cycle.

Fig. 5. The decomposition of the path segment into <sup>Λ</sup>0γ<sup>f</sup>1Λ<sup>1</sup> ··· <sup>Λ</sup><sup>x</sup>−<sup>1</sup>γ<sup>f</sup><sup>x</sup> <sup>Λ</sup>x, as above. Notice that the unary counter is always at least |Q| as no configurations indexed by U are present.

Reshuffling Procedure. In the rest of the proof we will modify the path segment (above) of the path ρ with a procedure that we call reshuffling. At the end of this procedure we will find a monotone cycle and satisfy case (ii) of this lemma. We either find this cycle directly, or we obtain a linear form - such that C(--) ≺lex C() contradicting the assumption that has minimal cost.

Note that x = 2(|Q| <sup>2</sup> + 2) is even, and for every pair of consecutive cycles <sup>γ</sup><sup>2</sup>j−<sup>1</sup> and <sup>γ</sup><sup>2</sup><sup>j</sup> (for <sup>1</sup> <sup>&</sup>lt; <sup>2</sup><sup>j</sup> <sup>≤</sup> <sup>x</sup>), consider the subsegment <sup>γ</sup><sup>f</sup>2j−<sup>1</sup>Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2<sup>j</sup> . There are two scenarios depending on the variant of the non-monotone cycle γ. In the scenario where γ is positive-negative, we move an iteration of γ from right to left obtaining <sup>γ</sup><sup>f</sup>2j−1+1Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2j−<sup>1</sup>. In the scenario where <sup>γ</sup> is negative-positive, we move an iteration of <sup>γ</sup> in the opposite direction obtaining <sup>γ</sup><sup>f</sup>2j−1−<sup>1</sup>Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2j+1.

We repeat this procedure until one of two conditions are met. The first is when there are no iterations of γ on one side, so either f<sup>2</sup>j−<sup>1</sup> or f<sup>2</sup><sup>j</sup> becomes 0. The second is when there appears a configuration, in the run over the path subsegment after reshuffling, with unary counter value less than |Q|. See Figure 6 for a pictorial presentation of reshuffling in the scenario where γ is positivenegative.

Fig. 6. Reshuffling around a path Λ (blue) where γ (red) is positive-negative. Before reshuffling, the path subsegment ··· γΛγ ··· all configurations have unary counter value at least |Q| in the run (left). After reshuffling, the path subsegment ··· γγΛ ··· , there is a configuration with unary counter value less than |Q| in the run (right).

We claim that after each reshuffling step, the corresponding run remains executable, so we must check that both counters remain non-negative. Notice that by only moving a cycle, the total effect of the path subsegment remains the same. Therefore, if the run was executable before reshuffling, we can safely assume that the prefix before the path subsegment and the suffix after the path subsegment are still executable. For that reason, consider the counter values of configurations occurring in the run over the reshuffled path subsegment. We focus on a single step of the reshuffling procedure that concerns the subsegment γ<sup>f</sup>2j−<sup>1</sup>Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2<sup>j</sup> .

Suppose γ is a positive-negative cycle. Then the reshuffling procedure moves <sup>γ</sup> from right to left. We claim that since <sup>f</sup><sup>2</sup>j−<sup>1</sup> <sup>&</sup>gt; <sup>0</sup> and <sup>Λ</sup>0γ<sup>f</sup>1Λ<sup>1</sup> ···Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2j−<sup>1</sup> is executable, the subsegment <sup>Λ</sup>0γ<sup>f</sup>1Λ<sup>1</sup> ···Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2j−1+1 is executable from the initial configuration. This is because one prerequisite of the reshuffling procedure is that all configurations occurring in the run over the path subsegment have at least |Q| unary counter value. Moreover, the cycle γ has length at most |Q| so grdu(γ) ≥ −|Q| means the unary counter value remains non-negative. As for the binary counter value, since a single execution of γ increases the binary counter and an iteration of γ was already executed before reshuffling, <sup>Λ</sup>0γ<sup>f</sup>1Λ<sup>1</sup> ···Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2j−1+1 is executable. In the same way, from the initial configuration, <sup>Λ</sup>0γ<sup>f</sup>1Λ<sup>1</sup> ···Λ<sup>2</sup>j−<sup>1</sup>γ<sup>f</sup>2j−1+1Λ<sup>2</sup>jγ<sup>f</sup>2j−<sup>1</sup> <sup>2</sup><sup>j</sup> is executable. This is because effu(γ) ≥ −|Q|, and again, all configurations occurring in the run over the path subsegment have at least <sup>|</sup>Q<sup>|</sup> unary counter value, and also because of the monotonicity on the binary counter.

The argument when γ is a negative-positive cycle is analogous. This concludes the correctness analysis of the reshuffling procedure.

Finishing Reshuffling. We analyse what happens when reshuffling is finished. Suppose that there exists a pair <sup>2</sup><sup>j</sup> <sup>−</sup> <sup>1</sup> and <sup>2</sup><sup>j</sup> such that the reshuffling finishes under the first condition where all iterations of γ have been moved to one side of <sup>Λ</sup><sup>2</sup>j−<sup>1</sup>. In this case we obtain a new linear form - for ρ, where one collection of the cycle γ has been removed (decrementing k). So (C(--))<sup>2</sup> <sup>=</sup> <sup>k</sup> <sup>−</sup> <sup>1</sup> <sup>&</sup>lt; (C())<sup>2</sup> and the two adjacent path segments can be combined without changing the summed length of paths so (C(--))<sup>1</sup> = (C())1. Therefore, C(--) <sup>≺</sup>lex <sup>C</sup>() contradicting the assumption has the minimal cost.

Otherwise, for every <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> x/<sup>2</sup> the reshuffling of pair <sup>2</sup><sup>j</sup> <sup>−</sup><sup>1</sup> and <sup>2</sup><sup>j</sup> finishes under condition the second condition. So there is a configuration with unary counter value less than <sup>|</sup>Q<sup>|</sup> in the run induced from the path <sup>ρ</sup> for each pair <sup>2</sup>j−<sup>1</sup> and 2j (see Figure 7). Recall that <sup>x</sup> <sup>2</sup> <sup>=</sup> <sup>|</sup>Q<sup>|</sup> <sup>2</sup> + 2, that is the number of pairs. Akin to the first observation (in the beginning of this proof), we use the pigeonhole principle on the number of such configurations to obtain two configurations with matching states and equal unary counter values. The path segment inducing the part of the run between these two configurations is a monotone cycle, regardless of the binary effect. Again, it must be true that the length of this cycle is less than the length of the whole path, so we obtain a monotone cycle decomposition of ρ. Thus case (ii) of the lemma holds.

Fig. 7. After reshuffling is finished under condition the second condition, we can find a zero unary effect cycle using the (sufficiently many) configurations with unary counter less |Q|.

#### 5.2 Applying Reshuffling

Lemma 3 does not necessarily return a narrow linear form for a path π witnessing coverability. Instead it may return a monotone cycle decomposition (ρ, σ, τ ) of π. Our next goal is to show that there exists polynomial size certificates for ρ and σ (Lemma 4), and then to show that there exists a polynomial size certificate for τ (Lemma 5). Like linear forms, there can be many monotone cycle decompositions for a path. Following, we will use the cost function assigning monotone cycle decompositions to pairs of natural numbers D((ρ, σ, τ )) := (len(ρσ), len(σ)). Note that we can compare two decompositions using their cost, even if they are for two different paths.

Lemma 4. *Suppose* p(**u**) <sup>∗</sup> −→ <sup>q</sup>(**v**) *yet there is no narrow linear form for any path* π *such that* p(**u**) <sup>π</sup> −→ <sup>q</sup>(**w**) *and* **<sup>w</sup>** <sup>≥</sup> **<sup>v</sup>***, then there exists a path* <sup>π</sup> *such that*

*(a)* p(**u**) <sup>π</sup>- −→ <sup>q</sup>(**w** ) *where* **<sup>w</sup>** <sup>≥</sup> **<sup>v</sup>***,*

*(b) there is a monotone cycle decomposition* (ρ, σ, τ ) *of* π *where* eff(σ) > **0***, and (c) there are narrow linear forms for both* ρ *and* σ*.*

*Proof.* We will again use the lexicographical order <sup>≺</sup>lex to compare the cost of monotone cycle decompositions. Let π be a path of minimum length such that p(**u**) <sup>π</sup> −→ <sup>q</sup>(**w**) where **<sup>w</sup>** <sup>≥</sup> **<sup>v</sup>**. Let <sup>c</sup> = (ρ, σ, τ ) be the monotone cycle decomposition of <sup>π</sup> that minimizes the cost <sup>D</sup>(c) under the <sup>≺</sup>lex order. Such a decomposition must exist, otherwise applying Lemma 3 would return a narrow linear form for ρ such that p(**u**) <sup>ρ</sup> −→ <sup>q</sup>(**w** ) and **w** ≥ **w** ≥ **v**, contradicting an assumption of this lemma. Observe that eff(σ) > **0**, otherwise one can remove σ and consider the shorter path ρτ , contradicting the minimal length of π. Next, we argue that ρ and σ do not have monotone cycle decompositions, we then leverage Lemma 3 to obtain the narrow linear forms required.

*Path* ρ *cannot be decomposed further.* Towards a contradiction, assume that there is a monotone cycle decomposition c = (ρ , σ , τ ) of ρ. Observe that the following monotone cycle decomposition c = (ρ , σ , τ στ ) of π has lower cost D(c ) <sup>≺</sup>lex <sup>D</sup>(c) as (D(c ))<sup>1</sup> = len(ρ )+len(σ ) < len(ρ)+len(σ)=(D(c))1. This contradicts the assumption that (ρ, σ, τ ) has minimum cost.

Suppose p(**u**) <sup>ρ</sup> −→ <sup>p</sup> (**x**). Since there is no monotone cycle decomposition, applying Lemma 3 to ρ returns a path ρ with a narrow linear form such that p(**u**) <sup>ρ</sup>- −→ <sup>p</sup> (**x** ) where **<sup>x</sup>** <sup>≥</sup> **<sup>x</sup>** and len(ρ ) <sup>≤</sup> len(ρ).

*Cycle* σ *cannot be decomposed further.* Towards a contradiction, assume that there is a monotone cycle decomposition (ρ , σ , τ ) of σ. Observe that the following monotone cycle decomposition c = (ρρ , σ , τ τ ) of π has lower cost D(c ) <sup>≺</sup>lex <sup>D</sup>(c) as (D(c ))<sup>1</sup> = len(ρ) + len(ρ ) + len(σ ) <sup>≤</sup> len(ρ) + len(σ) = (D(c))<sup>1</sup> and (D(c ))<sup>2</sup> = len(σ ) < len(σ)=(D(c))2. This contradicts the assumption that (ρ, σ, τ ) has minimum cost.

Suppose p (**x**) <sup>σ</sup> −→ <sup>p</sup> (**y**). Since there is no monotone cycle decomposition, applying Lemma 3 to σ returns a path σ with a narrow linear form such that p (**x**) <sup>σ</sup>- −→ <sup>p</sup> (**y** ) where **<sup>y</sup>** <sup>≥</sup> **<sup>y</sup>** and len(σ ) <sup>≤</sup> len(σ). In particular, it is also true that eff(σ ) <sup>≥</sup> eff(σ) <sup>&</sup>gt; **<sup>0</sup>**.

#### 212 F. Mazowiecki et al.

Replacing ρ for ρ and σ for σ in π yields a path π- . Clearly if p(**u**) <sup>π</sup> −→ q(**w**) where **<sup>w</sup>** <sup>≥</sup> **<sup>v</sup>**, then <sup>p</sup>(**u**) <sup>π</sup>- −→ q(**w**- ) where **w**- ≥ **w** ≥ **v**. Finally, (ρ- , σ- , τ ) is monotone cycle decomposition of π such that eff(σ- ) > **0** and ρ and σ have narrow linear forms, as required.

We now aim to obtain a narrow linear form for τ . Note that Lemma 4 gives us a monotone cycle σ with positive effect on at least one counter, i.e. eff(σ) > **0**. By pumping σ we can force one of the counters to take an arbitrarily large value (following, the vector **x** reflects this large value for Lemma 5). Then, loosely speaking, the problem reduces to coverability in 1-VASS. However, proving the existence of a polynomial size compressed linear form path in Theorem 1 requires more care. Note that Lemma 5 is stated for 2-VASS (not necessarily with one unary counter). First we need to recall the following bound on counter values observed throughout runs. Recall that |V |max := |Q| + |T|·|T|max is the pseudopolynomial size of the input.

Theorem 2 (Corollary from Theorem 3.2 in [4]). *Consider a 2-VASS (with both counters in binary)* V = (Q, T) *and let* p(**u**) <sup>∗</sup> −→ q(**v**)*, then there exists a run* p(**u**) = q0(**v**0)*,* q1(**v**1),...,qm(**v**m) = q(**v**) *such that* |**v**0|max, |**v**1|max,..., <sup>|</sup>**v**m|max <sup>≤</sup> (|<sup>V</sup> <sup>|</sup>max <sup>+</sup> <sup>|</sup>**u**|max <sup>+</sup> <sup>|</sup>**v**|max)O(1)*.*

In the following lemma, that is proved in the appendix, given a 2-VASS V , the initial configuration p(**u**), and target configuration q(**v**), we write B in place of (|<sup>V</sup> <sup>|</sup>max+|**u**|max+|**v**|max)O(1) from Theorem <sup>2</sup> and we fix **<sup>x</sup>** = (4B|Q<sup>|</sup> <sup>2</sup>|<sup>V</sup> <sup>|</sup> 2 max, 0).

Lemma 5. *Consider a 2-VASS (with both counters in binary)* V = (Q, T) *and let* p(**u**) <sup>∗</sup> −→ q(**v**)*, then there exists a narrow linear form path* π *such that* p(**u** + **x**) <sup>π</sup>- −→ q(**v**- ) *for some* **v**-≥ **v***.*

## 6 Proof of Theorem 1

Before proving Theorem 1, we employ the fact that for a general 2-VASS, not necessarily with one unary counter, the exponents of cycles in linear forms can be pseudo-polynomially bounded.

Lemma 6 (Corollary from Lemma 18 in [5]). *Let* π *be path in a 2-VASS with a linear form* π = τ0γ<sup>f</sup><sup>1</sup> <sup>1</sup> <sup>τ</sup><sup>1</sup> ...γ<sup>f</sup><sup>k</sup> <sup>k</sup> <sup>τ</sup><sup>k</sup> *such that* <sup>p</sup>(**u**) <sup>π</sup> −→ q(**v**)*. Then there exist a path* π- = τ0γ<sup>e</sup><sup>1</sup> <sup>1</sup> <sup>τ</sup><sup>1</sup> ··· <sup>τ</sup><sup>k</sup>−<sup>1</sup>γ<sup>e</sup><sup>k</sup> <sup>k</sup> <sup>τ</sup><sup>k</sup> *such that* <sup>p</sup>(**u**) <sup>π</sup>- −→ q(**v**- ) *where* **v**- ≥ **v** *and* bitsize(e1),..., bitsize(ek) *are all bounded by a polynomial in* |V | + bitsize(**u**) + bitsize(**v**)*.*

*Proof of Theorem 1.* Let p(**u**) <sup>π</sup> −→ q(**v**) for some path π. If there is a narrow linear form for π then by Lemma 6 we obtain π- = τ0γ<sup>e</sup><sup>1</sup> <sup>1</sup> <sup>τ</sup><sup>1</sup> ··· <sup>τ</sup><sup>k</sup>−<sup>1</sup>γ<sup>e</sup><sup>k</sup> <sup>k</sup> τ<sup>k</sup> such that p(**u**) <sup>π</sup>- −→ q(**v**- ) where **v**-≥ **v** and bitsize(e1),..., bitsize(ek) are all bounded above by a polynomial in |V | + bitsize(**u**) + bitsize(**v**). Since is a narrow linear form, we know that k ≤ P(|Q|) so k <sup>i</sup>=1 len(γi) ≤ k|Q|≤|Q|P(|Q|) and we also know that k <sup>i</sup>=0 len(τi) ≤ |Q|(P(|Q|) + 1). Together, this implies the linear form path πis of polynomial size.

It remains to consider the case when there is no narrow linear form for π. By Lemma 4 (via Lemma 3) there exists a path π such that p(**u**) <sup>π</sup>- −→ q(**v**- ) and **v**- ≥ **v**. Moreover, there is a monotone cycle decomposition (ρ, σ, τ ) of π such that eff(σ) > **0** and there are narrow linear forms for both ρ and σ.

Assume that (eff(σ))<sup>1</sup> > 0. This is without loss of generality because if (eff(σ))<sup>1</sup> = 0 then one can flip the coordinates in V , **u** and **v** (for the remainder of the proof it will not matter that one counter is unary). Let p- (**m**) be the configuration such that p(**u**) <sup>ρ</sup> −→ p- (**m**) στ −−→ q(**v**- ). Observe that since eff(σ) > **0** for every <sup>i</sup> <sup>∈</sup> <sup>N</sup> the path ρσ<sup>i</sup> induces the run <sup>p</sup>(**u**) ρσ<sup>i</sup> −−→ p- (**m** + i · eff(σ)). Consider x = (**x**)<sup>1</sup> = 4B|Q| <sup>2</sup>|<sup>V</sup> <sup>|</sup> 2 max (for Lemma 5), clearly x is large enough so that <sup>p</sup>(**u**) ρσ<sup>x</sup> −−→ p- (**m**- ) and **m**- ≥ **m** + **x**. By Lemma 5 there exists a narrow linear form for a path τ such that p- (**m**- ) <sup>τ</sup>- −→ q(**v**--) and **v**-- ≥ **v**- .

We conclude by considering the compressed linear form path ρσ<sup>x</sup>τ such that <sup>p</sup>(**u**) ρσxτ- −−−−→ q(**v**--) and **v**-- ≥ **v**- ≥ **v**. Since ρ, σ, and τ have narrow linear forms, we can also bound the exponents using Lemma 6 as in the beginning of this proof. Finally, bitsize(x) is polynomial in |V | + bitsize(**u**) + bitsize(**v**) much like the exponents of the cycles in the linear forms. Therefore, the size of the compressed linear form ρσ<sup>x</sup>τ is polynomial in |V | + bitsize(**u**) + bitsize(**v**).

## 7 Conclusion and Future Work

In this paper we proved that coverability in 2-VASS with one unary counter is in NP, a drop in complexity from PSPACE for general 2-VASS. We achieve this by using our new techniques. Most notably, we polynomially bounded the number of short cycles that need to be used (Section 4). Then, we attempt to find a polynomial linear form path by replacing short cycles and reshuffling the path (Section 5).

A natural extension is to consider whether coverability in 3-VASS with one binary counter and two unary counters is also in NP. More generally, there is the problem of determining the complexity of coverability in k-VASS with one binary counter and k − 1 unary counters. The technique for polynomially bounding the number of short cycles that need be used can easily be generalised to these higher dimension VASS with only one binary counter. However, it is not clear how to modify and use our reshuffling technique. Another open problem is whether reachability in 2-VASS with one unary counter is also in NP. Note that completeness would immediately follow from the fact that reachability in binary encoded 1-VASS is NP-hard [22].

## 214 F. Mazowiecki et al.

## References


216 F. Mazowiecki et al.

Ahmedabad, India, volume 122 of LIPIcs, pages 31:1–31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.FSTTCS.2018.31.


*Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part II*, volume 9135 of *Lecture Notes in Computer Science*, pages 324–336. Springer, 2015. doi:10.1007/ 978-3-662-47666-6\\_26.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## On History-Deterministic One-Counter Nets

Aditya Prakash and K. S. Thejaswini( -)

Department of Computer Science, University of Warwick, Coventry, UK {aditya.prakash,thejaswini.raghavan.1}@warwick.ac.uk

Abstract. We consider the model of history-deterministic one-counter nets (OCNs). History-determinism is a property of transition systems that allows for a limited kind of non-determinism which can be resolved 'on-the-fly'. Token games, which have been used to characterise historydeterminism over various models, also characterise history-determinism over OCNs. By reducing 1-token games to simulation games, we are able to show that checking for history-determinism of OCNs is decidable. Moreover, we prove that this problem is **PSPACE**-complete for a unary encoding of transitions, and **EXPSPACE**-complete for a binary encoding and undecidable for one-counter automata (OCA), which are OCNs that can test for zeroes.

We then study the language properties of history-deterministic OCNs. We show that the resolvers of non-determinism for history-deterministic OCNs are eventually periodic. As a consequence, for a given historydeterministic OCN, we construct a language equivalent deterministic OCA. We also show the decidability of comparing languages of historydeterministic OCNs, such as language inclusion and language universality.

Keywords: History-determinism · Token games · One-counter nets · One-counter automaton.

## 1 Introduction

While deterministic automata are algorithmically efficient for problems such as synthesis or for solving games, they are often much less succinct, or less expressive than their non-deterministic counterparts. As such, many intermediate models between determinism and non-determinism have been studied [1,2,3,4,5], with history-determinism being one such well-studied notion over the recent years. History-deterministic automata over infinite words with parity acceptance condition was introduced by Henzinger and Piterman as a tool to solve verification games, although dubbed good-for-games in their work [6]. Such automata are known to be exponentially more succinct than their deterministic counterpart [7], and are known to form a robust class of automata that is both algorithmically and conceptually interesting [6,8,9,7,10,11,12,13,14].

The notion of history-determinism emerged independently in the setting of cost automata that can capture all regular cost functions as opposed to their deterministic version [15]. Recently, history-determinism has been studied in quantitative settings [16,17], as well as infinite-state systems such as pushdown automata [18,19], Parikh automata [20], and timed automata [21,22], where they are often more succinct and expressive than their deterministic counter part.

One-counter nets are finite-state systems along with a counter that stores a non-negative integer value which can never be explicitly tested for zero. They correspond to 1-dimensional VASS, Petri nets with exactly one unbounded place, and are a subclass of one-counter automata which do not have zero tests, and hence are also a subclass of pushdown automata. They are one of the simplest infinite-state systems, and hence many problems pertaining to one-counter nets are easier than models that subsume them.

The structure of the resolvers that resolve non-determinism on-the-fly are crucial to understand history-determinism in various models. While for automata over infinite words with parity conditions, these resolvers take the shape of deterministic parity automata [6], the situation for resolvers in history-deterministic infinite-state systems is not as well understood. Indeed, the computability of such a resolver for a given history-deterministic pushdown automaton is left as an open problem in the works of Guha, Jecker, Lehtinen and Zimmermann [18]. For history-deterministic Parikh automata, it is still an open problem if the resolver can be given by a deterministic Parikh transducer [20]. Moreover, many other problems such as deciding history-determinism or even language inclusion among history-deterministic automata are undecidable for pushdown automata and Parikh automata [18,19,20]. We consider history-determinism over one-counter nets, where we are able to answer positively to all of the above questions.

To answer several of these questions, we use results and techniques from the simulation problem over one-counter nets [23,24]. This is not surprising, since simulation of various models has close ties with history-determinism [6,21].

Our Contribution We study history-deterministic OCNs and establish them as a class of infinite-state systems where many problems pertaining to historydeterminism are decidable. This is unlike many other classes of history-deterministic infinite-state systems that have been studied so far.

Firstly, we show that checking for history-determinism of a given one-counter net is **PSPACE**-complete when the transitions are encoded in unary, and is **EXPSPACE**-complete for a more succinct encoding (Theorem 4, Theorem 26). We achieve the upper bound by giving a novel reduction from the one-token game [11] to the simulation problem over OCNs. One-token games characterise history-determinism over OCNs, and thus our reduction further extends the link between history-determinism and simulation. This decidability result is in contrast to one-counter automata (OCA), where checking for history-determinism becomes undecidable by just adding zero-tests to OCNs (Theorem 27).

Secondly, we show that resolvers for non-determinism in history-deterministic OCNs can be expressed as an eventually periodic set. Using this, we are able to determinise history-deterministic OCNs to give a language equivalent deterministic OCA.

### 220 A. Prakash and K. S. Thejaswini

Finally, we show the problems of language inclusion and language universality for history-deterministic OCNs to be in **PSPACE** and **P** respectively. This is in unlike non-deterministic OCNs, where these problems are known to be undecidable and Ackermann-complete respectively. Even for the class of deterministic OCA—which we show history-deterministic OCNs can be converted to—the inclusion problem is known to be undecidable.

Good-for-Gameness A notion closely related to history-determinism (HD) is that of good-for-gameness. An automaton is said to be good-for-games (GFG) if its composition with a game whose acceptance condition is given by the language of the automaton yields an equivalent game. For parity automata over infinite words, these two notions are known to be equivalent [6,25], but they do not coincide on all models [16]. For the purposes of our paper, we deal with history-deterministic OCNs, as in our setting the notion of history-determinism is equivalent to good-for-gameness when composition with infinitely branching games is considered [26]. We note however, that this is not true when compositionality is restricted to only finitely branching games [26].

## 2 Preliminaries

We use N to denote the set of positive integers and N<sup>0</sup> to denote non-negative integers. An alphabet, denoted by Σ, is any finite non-empty set of letters, and the set of all finite words over Σ is denoted by Σ∗. The empty word over Σ is denoted by , and we use Σ to denote the set Σ ∪ {}. A language L over Σ is a subset of Σ∗.

Labelled Transition System A labelled transition system (LTS) is a tuple S consisting of S = (Q, Σ, -, q0, F). In this paper, we assume that Q is a (countable) set of states, q<sup>0</sup> ∈ Q is the initial state, F ⊆ Q is the set of final states, Σ is a finite alphabet, -⊆ Q × Σ-× Q is the set of transitions.

If a transition (q1, a, q2) belongs to -, we instead represent it as q<sup>1</sup> a − q<sup>2</sup> as well. On a finite word w, a ρ is said to be a run of the labelled transition system S if it is a finite alternating sequence of states and letters of Σ: ρ = q0 <sup>a</sup><sup>0</sup> − q<sup>1</sup> <sup>a</sup><sup>1</sup> −- ...q<sup>k</sup>−<sup>1</sup> <sup>a</sup><sup>k</sup> − qk, where each i, q<sup>i</sup> <sup>a</sup><sup>i</sup> − qi+1 ∈ and a<sup>i</sup> ∈ Σ such that w = a<sup>0</sup> · a<sup>1</sup> ...ak. A run ρ described above is accepting if the state q<sup>k</sup> ∈ F.

An LTS that has no -transitions is said to be a realtime LTS. For an LTS S = (Q, Σ, -, q0, F) being realtime, we have -⊆ Q × Σ × Q. Unless mentioned otherwise, we mostly deal with realtime LTS for the sake of a simpler presentation. An LTS S = (Q, Σ, -, q0, F) is deterministic if is a function from Q×Σ to Q and not just a relation.

Two player games Throughout the paper, we will be using two player games on countably sized arenas, between the players Adam and Eve, denoted by ∀ and ∃ respectively. The winning condition will be a reachability condition for one of the players, often ∀. These can be interpreted as a Gale-Stewart games [27] and we know that such games are determined, that is they have a winner, which is either ∀ or ∃. Moreover, each of the players have a positional strategy, where their current strategy depends on their positions in the current arena. We say that two games are *equivalent*, if they have the same winner.

*One-Counter Automata* <sup>A</sup> *one-counter automaton* (OCA) <sup>A</sup> is given by a tuple <sup>A</sup> = (Q, Σ, Δ, q<sup>0</sup>, F), where <sup>Q</sup> is a finite set of states, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the initial state, F <sup>⊆</sup> Q is the set of final states, Σ is a finite alphabet, and Δ is the set of transitions, given as a relation Δ <sup>⊆</sup> Q × {zero, <sup>¬</sup>zero} × Σ × {−1, <sup>0</sup>, <sup>1</sup>} × Q.

Here, the symbols zero and ¬zero are used to distinguish between transitions that can happen when the counter value is 0, and when the counter value is positive respectively. One can think of the counter as a stack, where the stack has a distinguished bottom-of-the-stack symbol, which cannot be popped. The configurations in the automaton are given by pairs (q,m), where q denotes the current state, and <sup>m</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> denotes the counter value. We use <sup>C</sup>(A) to denote the set of configurations of A.

A one-counter automaton generates an infinite-state LTS over the set of configurations Q <sup>×</sup> <sup>N</sup>, such that the transitions are as defined below. For each configuration (q,m), upon reading a <sup>∈</sup> Σ-,


For two configurations c, c- ∈ C(A) = Q <sup>×</sup> <sup>N</sup>0, we use the notation c a,d −− c to denote the fact that c can be reached from c upon taking some transition δ <sup>∈</sup> Δ upon reading a, with a change of counter value d. We shall also say that c a,d −− c- is a transition in <sup>A</sup>, as c a,d −− c is a transition in the infinite LTS of A. We thus view A as both an automaton and a LTS (generated by A), and switch between these two notions interchangeably. A run of <sup>A</sup> over a word w is a finite sequence of alternating configurations and transitions : <sup>ρ</sup> <sup>=</sup> <sup>c</sup><sup>0</sup> a0,d<sup>0</sup> −−− <sup>c</sup><sup>1</sup> ··· <sup>c</sup><sup>n</sup> an,d<sup>n</sup> −−−− c<sup>n</sup>+1 such that <sup>a</sup><sup>0</sup>a<sup>1</sup> ··· <sup>a</sup><sup>n</sup> <sup>=</sup> <sup>w</sup>, and <sup>c</sup><sup>0</sup> = (q<sup>0</sup>, 0). The run <sup>ρ</sup> is an *accepting run* if its last configuration <sup>c</sup><sup>n</sup>+1 = (q<sup>n</sup>+1, k<sup>n</sup>+1) is accepting, i.e. <sup>q</sup><sup>n</sup>+1 <sup>∈</sup> <sup>F</sup>. We say a word w is an *accepting word* in <sup>A</sup> if it has an accepting run in <sup>A</sup>. Finally, we define the language of A, denoted by L(A) to be the set of all accepting words in <sup>A</sup>. We say that <sup>A</sup> is a *deterministic one-counter automaton*, if Δ is a (partial) function from Q × {zero, <sup>¬</sup>zero} × Σ to {−1, <sup>0</sup>, <sup>1</sup>} × Q.

*One-counter nets* The model of *one-counter nets* (OCNs) can be interpreted as a restriction added to one-counter automaton that do not have the ability to test for zero. Alternatively, one can view this as a finite-state automaton that has access to a stack which can store only one symbol and no bottom-of-thestack element. Any feasible run cannot pop an empty stack. More formally, a one-counter net <sup>N</sup> is a tuple (Q, Σ, Δ, q<sup>0</sup>, F) where <sup>Q</sup> is the set of finite states, <sup>Σ</sup> is a finite alphabet, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the initial state and <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is the set of final or accepting states. The set Δ ⊆ Q × Σ × {−1, 0, 1} × Q are the transitions in the net N .

The configurations of an OCN are similar to that of an OCA. It consists of a pair (q, n) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>N</sup>0. We shall use the notation <sup>C</sup>(<sup>N</sup> ) = <sup>Q</sup> <sup>×</sup> <sup>N</sup><sup>0</sup> to denote the set of configurations of N . From a configuration (q, n), we reach a configuration (p, n + d) in one step, if there is a transition δ = (q, a, d, p), for some a ∈ Σ and d ∈ {−1, 0, +1} and n + d ≥ 0. We can define a run on an OCN, an accepting run and an accepting word similar to an OCA. We say an OCN N is *complete* if for every configuration c ∈ C(N ) and every letter a ∈ Σ, there exists a transition c a,d −− c- .

*Remark 1.* For the most of the paper we talk about one-counter nets (automata) with unary transitions, i.e. transitions that increment or decrement the counter by at most 1. However, they are as expressive as succinct models where the onecounter net has a *binary encoding*, i.e. when the transitions allow the counter to be incremented or decremented by positive integers represented in binary. This can be observed, for instance, by giving a construction similar to that of Valiant's for deterministic pushdown automata ([28], Section 1.7).

*History-Deterministic One-Counter Nets* We define history-determinism in the setting of one-counter net. Informally, an OCN N is *history-deterministic*, if the non-deterministic choices required to accept a word w which is in L(N ) can be made on-the-fly. These choices depend only on the word read so far, and do not require the knowledge of the future of the word to construct an accepting run for a word in L(N ) (hence the term history-determinism). Formally, we say an OCN N is history-deterministic, if ∃ wins the letter game on N defined below.

Definition 2 (Letter game for OCN). *Given an OCN* N = (Q, Σ, Δ, q0, F)*, the letter game on* N *is defined between the players* ∀ *and* ∃ *as follows: the positions of the game are* C(N ) × Σ∗*, with the initial position* ((q0, 0), )*. At round* i *of the play, where the position is* (ci, wi)*:*


*If* ∃ *is unable to choose a transition (i.e. there is no* a<sup>i</sup> *transition at the configuration* c<sup>i</sup> *in the LTS generated by the net* N *), and* wi+1 = wia<sup>i</sup> *is the prefix of an accepting word,* ∃ *loses immediately. The player* ∀ *wins immediately when the word* wi+1 *is accepting but the configuration* ci+1 *is not at an accepting state, and the game terminates. The game continues from* (ci+1, wi+1) *otherwise. Player* ∃ *wins any infinite play.*

We say a strategy for ∃ in the letter game of N is a *resolver* for N , if it is a winning strategy for ∃ in the letter game.

Our characterization of history-deterministic one-counter nets by the above letter game is slightly different from the one presented in the work of Guha, Jecker, Lehtinen, and Zimmermann [18] for pushdown automata. In their work, they define history-determinism as having a consistent strategy based on the transitions taken so far. It is easy to argue that these two definitions are equivalent.

The letter game can be formulated as a reachability game over countably many vertices, where the player ∀ is trying to reach a position of the form (c, w) ∈ C(<sup>N</sup> )×Σ<sup>∗</sup>, where c is at a rejecting state, while w is accepting. As such games are determined [27], the notion of history-determinism formulated as ∃ winning the letter game is well-defined.

Letter games have been used extensively to characterise history-determinism for other models as well, such as parity automata [6] and for various kinds of quantitative and timed automata on both finite and infinite words [12,16,21].

To aid our understanding of history-determinism as well as the above definition, we provide an example of a game where ∃ wins the letter game on this automaton but the strategy is based on her counter configuration.

*Example 3.* Consider the language

$$\mathcal{L} = \left\{ a^n \\$b^{n\_1} \\$b^{n\_2} \\$\dots \\$b^{n\_k} \\$\mid \sum\_{i=1}^k n\_i \le n \text{ or } n\_k = 2, \sum\_{i=1}^{k-1} n\_i = n - 1 \right\}$$

which can be accepted by a history-deterministic OCN as shown in Figure 1. The initial state is indicated with an arrow pointing to it, and the final states are double-circled. Missing transitions are assumed to go to a rejecting sink state. In the corresponding letter game, <sup>∀</sup> plays the letter a several times, say n-many times followed by a \$. The corresponding transitions so far are deterministic. Later, <sup>∀</sup> reads some series of bs and \$s, such that the word continues to be in the language. Note that the non-determinism occurs in only one state, which is marked with an X, upon reading the letter b. A winning strategy of <sup>∃</sup> which proves that this net is history-deterministic is the following: she takes the 'down' transition if the counter value is strictly larger than 1, but the 'right' transition on b otherwise. This non-determinism can't be resolved by removing transitions, because removing either of the 'down' b-transition or the 'right' b-transition changes the language accepted. We note that an equivalent deterministic OCN exists nevertheless, where on reading a b after any \$ does not change the value of the counter, but reduces the counter by two for the second b after a \$ and reduces the counter by 1 for any b after that, until a \$ is seen again.

## 3 Deciding History-Determinism

The main result of this section is that deciding history-determinism for a given OCN is decidable and is **PSPACE**-complete as stated in the theorem below.

Theorem 4. *Given a one-counter net* <sup>N</sup> *, checking if* <sup>N</sup> *is history-deterministic is* **PSPACE***-complete.*

Fig. 1. A history-deterministic OCN accepting L

The rest of this section is dedicated to the proof of the above statement.

The proof of showing the upper bound proceeds by a series of polynomial time reductions as below.

Deciding history-determinism



Deciding if ∃ wins simulation game

We shall define these games rigorously and prove these reductions in Subsection 3.1. Finally, since the winner of the simulation game over one-counter nets is in **PSPACE** [24], this gives us the upper bound.

For the lower bound, we reduce from the problem of emptiness checking for alternating finite-state automata over a unary alphabet to deciding if ∃ wins the letter game.

#### 3.1 Token Games

Deciding history-determinism efficiently for finite-state parity automata over infinite words has been a major area of study over the recent years. Bagnol and Kupergerg [11], gave a polynomial time procedure for deciding history-determinism when the finite automata accepts with a Büchi condition. Their underlying technique is a two-player game, called G<sup>2</sup> or 2-token games, which they proved to be equivalent to the letter game when the automaton is Büchi. Boker, Kuperberg, Lehtinen, and Skrzypczak [12] extended this to show that the game G<sup>2</sup> is equivalent to the letter game when the automaton is co-Büchi as well. Deciding the winner in G<sup>2</sup> for an automaton of a fixed parity index takes polynomial time [12], and hence deciding history-determinism for the cases of when the parity automata accepts words based on Büchi or co-Büchi condition is polynomial.

It is conjectured that winning <sup>G</sup><sup>2</sup> is equivalent to the letter game for higher parity indices as well, and this is known as the <sup>G</sup><sup>2</sup> conjecture [12]. Token games have also been instrumental in deciding history-determinism for quantitative automata, in the works of Boker and Lehtinen [17]. In their paper, they show that for finite words on a finite-state boolean automaton, history-determinism is characterised by G<sup>1</sup>. This was later adapted to labelled transition systems with safety acceptance condition, in the works of Henzinger, Lehtinen, and Totzke [21]. Thus, the 1-token games also characterise history-determinism for OCNs over finite words. We include a proof nonetheless, for the sake of completeness.

In a play of the letter game, ∀ picks the letters while ∃ picks the transitions, and the winning condition for ∃ is to produce an accepting run for any word that is in the language. Token games work similarly, but they impose more constraints on ∀. This is done by asking him to also display a valid run during the game with the help of some number of tokens. Here, we concentrate on the 1-token game <sup>G</sup><sup>1</sup>. The player <sup>∀</sup> wins the game <sup>G</sup><sup>1</sup> if and only if he produces an accepting run, whilst ∃ produces a rejecting run. We make this more formal in the definition below.

Definition 5 (One token game G<sup>1</sup>). *Let* <sup>N</sup> = (Q, Σ, Δ, q<sup>0</sup>, F) *be a onecounter net. The positions of the game* <sup>G</sup><sup>1</sup> *on* <sup>N</sup> *are a pair of configurations,* <sup>C</sup>(<sup>N</sup> ) × C(<sup>N</sup> )*, where the first configuration in the pair denotes the position of* <sup>∃</sup>*'s token, and the second* <sup>∀</sup>*'s token. The game starts with the initial position* (c∃ <sup>0</sup> , c<sup>∀</sup> <sup>0</sup> ) = ((q<sup>0</sup>, 0),(q<sup>0</sup>, 0))*. At the* <sup>i</sup> th *iteration of the play, where the position is* (c∃ <sup>i</sup> , c<sup>∀</sup> i )*:*


*If* <sup>∃</sup> *is unable to choose a transition for her token whereas* <sup>∀</sup> *can choose a transition and extend the run on his token to an accepting run, then the game terminates and* <sup>∃</sup> *loses the game. However, irrespective of* <sup>∃</sup>*'s ability to extend her run, if* <sup>∀</sup> *is unable to choose a transition for his token, then the game again terminates but* <sup>∀</sup> *loses the game.*

*If both the players can extend their runs by picking a transition then and if* <sup>∀</sup>*'s state in* c<sup>∀</sup> <sup>i</sup>+1 *is accepting, but* <sup>∃</sup>*'s state in* <sup>c</sup><sup>∃</sup> <sup>i</sup>+1 *is rejecting then again the game terminates and* <sup>∃</sup> *loses the game. Else, the game goes to* (c<sup>i</sup>+1, c <sup>i</sup>+1) *for another round of the play. We add that* <sup>∃</sup> *wins any infinite play.*

Letter games can be seen as a version of token games where ∀ plays with infinitely many tokens. We show in the following lemma that one-token games—even with this limited power of ∀—can capture letter games.

Lemma 6. *For an OCN* <sup>N</sup> *, if* <sup>∃</sup> *wins the game* <sup>G</sup><sup>1</sup> *on* <sup>N</sup> *, then* <sup>∃</sup> *has a winning strategy in the letter game.*

To prove the above lemma, we need to understand better the structure of the resolvers for OCNs. Consider the definition given below of *residual transitions*. Intuitively, these are transitions such that if there was an accepting word from a configuration with the first letter as a, then upon taking a residual transition on a, there is still an extension of the run on the word from the new configuration that is accepting. More formally, we say that a transition (q, k) a,d −−- (q- , k- ) is *residual* if <sup>L</sup>(q- , k- ) = <sup>a</sup>−<sup>1</sup>L(q, k), where <sup>L</sup>(q, k) (and <sup>L</sup>(q- , k- )) is the set of words that are accepted in N when the initial configuration is (q, k) ((q- , k- )), instead of (q0, 0). The proposition below shows any winning strategy of ∃ can be characterised by these residual transitions.

Proposition 7. *For an OCN* <sup>N</sup> *, an* <sup>∃</sup> *strategy* <sup>σ</sup> *in the letter game is winning for* <sup>∃</sup> *if and only if* <sup>σ</sup> *takes only residual transitions.*

Note that in the letter game, each player winning the game has a positional winning strategy, as it is a reachability game. Suppose that ∃ wins the letter game, then ∃ has a winning strategy which can be given by a (partial) function <sup>σ</sup> : (<sup>Q</sup> <sup>×</sup> <sup>N</sup>) <sup>×</sup> <sup>Σ</sup><sup>∗</sup> <sup>×</sup> <sup>Σ</sup> - Δ. Using Proposition 7, we can show that ∃'s strategy only depends on the configuration, and is independent of the word read so far.

Proposition 8. *If* <sup>∃</sup> *wins the letter game, then* <sup>∃</sup> *has a winning strategy* <sup>σ</sup> *that only depends on the current configuration of the play, i.e* <sup>σ</sup> *is a partial function* <sup>σ</sup> : (<sup>Q</sup> <sup>×</sup> <sup>N</sup>) <sup>×</sup> <sup>Σ</sup> -Δ

Having shown that G<sup>1</sup> is equivalent to the letter game, we show that deciding the winner in the game G<sup>1</sup> is in **PSPACE**. This implies deciding historydeterminism is also decidable, and in **PSPACE**. We do so by reducing G<sup>1</sup> to the simulation problem between two one-counter nets, which is known to be **PSPACE**-complete ([24], Theorem 7).

Given two OCNs N and N at configurations (q, n) and (q- , n- ), we say N - simulates N (or N is simulated by N - ) from their corresponding configurations if for any sequence of transitions from (q, n), there is also a sequence of transitions from (q- , n- ) which is built 'on-the-fly'. This alternation between existential and universal quantifiers in the above statement renders this definition perfect to be captured by the following game between the players ∀ and ∃.

Definition 9 (Simulation Game). *Given two OCNs* <sup>N</sup> = (Q, Σ, Δ, qI , F) *and* <sup>N</sup> - = (Q- , q- 0,Σ,Δ- , q- I , F- ) *and two configurations* <sup>c</sup> = (p, k) *and* <sup>c</sup>- = (p- , k- ) *in* <sup>C</sup>(<sup>N</sup> ) *and* <sup>C</sup>(<sup>N</sup> - ) *respectively where* k, k- <sup>∈</sup> <sup>N</sup>*. The* simulation game *between the OCNs* <sup>N</sup> *and* <sup>N</sup> - *at a position* (c, c- )*, denoted by* <sup>G</sup>((<sup>N</sup> , c) −- (N - , c- ))*, is a two player game between* <sup>∀</sup> *and* <sup>∃</sup>*, with positions in* <sup>C</sup>(<sup>N</sup> ) × C(<sup>N</sup> - ) *where the initial position is* (c0, c- 0)=(c, c- )*. At round* <sup>i</sup> *of the play, where the position is* (ci, c- i)*:*

– <sup>∀</sup> *selects a letter* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*, and a transition* <sup>c</sup>i a,d −−<sup>c</sup>i+1 *in* <sup>N</sup>

– <sup>∃</sup> *selects an* <sup>a</sup>*-transition* <sup>c</sup>- i a,d- −− c- i+1 *in* <sup>N</sup> - *If* ∀ *is unable to choose a transition, then* ∀ *loses the game immediately. If* ∃ *is unable to choose a transition but* ∀ *can select a transition and extend the run in* N *to an accepting run, then* ∃ *loses the game.*

*Otherwise, if* <sup>∀</sup>*'s state in* <sup>c</sup>i+1 *is accepting but* <sup>∃</sup>*'s state in* <sup>c</sup>- <sup>i</sup>+1 *is rejecting, then* ∃ *loses the game, and the game terminates. Else, the game goes to* (ci+1, c- <sup>i</sup>+1) *for another round of the play. The player* ∃ *wins any infinite play.*

If ∃ wins the above game, we say (N - ,(p- , k- )) simulates (<sup>N</sup> ,(p, k)), and we denote it by (<sup>N</sup> ,(p, k)) -−- (N - ,(p- , k- )). Furthermore, we say N simulates N or <sup>N</sup> -−- N if (<sup>N</sup> ,(q<sup>I</sup> , 0)) -−- (N - ,(q- <sup>I</sup> , 0)).

As the simulation game is a reachability game over a countably sized arena, it is determined, and the winning player has a positional strategy. Thus, if ∃ wins the above simulation game <sup>G</sup>((<sup>N</sup> ,(p, k)) -−- (N - ,(p- , k- ))), then ∃ has a positional winning strategy <sup>σ</sup><sup>∃</sup> : <sup>C</sup>(<sup>N</sup> ) × C(<sup>N</sup> - ) <sup>×</sup> <sup>Σ</sup> - Δ- .

*Remark 10.* In the literature over one-counter nets [29,24,30], the winning condition for the players on the simulation game is expressed differently, via the inability of the players to choose transitions, rather than accepting states. The player ∀ (∃) loses the game if ∀ (∃) is unable to choose a transition. It can however, be shown that the two versions of the simulation games are log-space reducible to each other.

Note the similarities (and differences) in G<sup>1</sup> and the simulation game. In both, the winning condition for ∀ would like ∀'s run to be accepting, while ∃'s to be rejecting. In <sup>G</sup><sup>1</sup> however, <sup>∃</sup> is picking the transition first, while in the simulation game, ∀ is picking the transition first.

With some modifications to the structure of the underlying net in G1, we can ensure that the simulation game between the modified net and the original net captures G1. The intuition is that, in the simulation game, the net which is simulated is modified so that ∀ is forced to delay choosing his transition. This is formalized in the proof of the following lemma, and explained with a diagram in Figure 2.

Lemma 11. *Given a one-counter net* N *, there are one-counter nets* M *and* M- *, which have size at most polynomial in size of* <sup>N</sup> *such that* <sup>∃</sup> *wins* <sup>G</sup><sup>1</sup> *on* <sup>N</sup> *if and only if* <sup>∃</sup> *wins* <sup>M</sup> -−- M- *.*

*Proof.* (Sketch) For each run in N , we have a run in M that lags behind one transition. The one-counter net M on the other hand is relatively similar to N . We impose this "one-transition lag" in M by construction where each transition chosen by ∀ in M corresponds to a letter along with a transition of N . But this transition of N is over the letter that ∀ had chosen last turn. The alternation produced between ∀ and ∃ in a play of the simulation game between M and M- of the nets constructed corresponds exactly the alternation produced between <sup>∀</sup> and <sup>∃</sup> in <sup>G</sup><sup>1</sup> over <sup>N</sup> . Figure <sup>2</sup> captures the intuition behind this construction discussed.

The net M is linear in the size of N whereas M has size approximately N ×|Σ|, where <sup>|</sup>Σ<sup>|</sup> is the size of the alphabet. This factor of <sup>|</sup>Σ<sup>|</sup> arises due to remembering the previous letter read in the state space to create this lag for ∀'s decisions.

Fig. 2. An illustration of a play of G1, seen as a play of the simulation game

Finally, we see that the following theorem from the work of Hofman, Lasota, Mayr, and Totzke [24] shows that the winner of a simulation game can be solved in **PSPACE**. We recall their results to fit our notation below.

Theorem 12 ([24], Theorem 7). *Given two one-counter nets* N *and* N - *, with configurations* (p, k) *and* (p- , k- ) *in* C(N ) *and* C(N - ) *respectively, with* k *and* k- *represented in binary, deciding whether* (N - ,(p- , k- )) *simulates* (N ,(p, k)) *is in* **PSPACE***. Moreover, the set of* (k, k- ) *for which* (N ,(p, k)) -−- (N - ,(p- , k- )) *is semilinear, and can be computed in* **EXPSPACE***.*

We get the following lemma as a corollary of Lemmas 6 and 11 and Theorem 12.

Lemma 13. *Given a one-counter net* N *, we can decide in* **PSPACE** *if* N *is history-deterministic.*

#### 3.2 Lower Bounds

Although solving the simulation game turns out to be **PSPACE**-complete itself from the work of Srba [29], this lower bound result does not work for our reduction to simulation games. The reduction we give from G<sup>1</sup> to simulation games produces only a restricted class of simulation games which solve G1.

Nevertheless, we show that deciding history-determinism is still **PSPACE**hard, showing that even this restriction of the simulation problem is enough to induce **PSPACE**-hardness.

Lemma 14. *Given a one-counter net* N *, it is* **PSPACE***-hard to decide if* N *is history-deterministic.*

Proof (Sketch). We reduce from the problem of checking non-emptiness of an alternating finite-state automaton over a unary alphabet. This problem was proven to be **PSPACE** complete by Holzer [31], with its proof simplified by Jančar and Sawa [32]. The intuition behind the reduction is to recreate a run of the alternating automaton in the letter game of a constructed OCN. In the letter game, a "fair" play of ∀ corresponds to a branch of a run-tree in the automaton, with ∃ resolving universal transitions and ∀ resolving existential ones. The player ∀ can ensure that he wins the letter game if and only if the alternating automaton has some word that he can demonstrate is in the language. If ∀ plays "unfairly", then there are gadgets to ensure that ∃ automatically wins.

## 4 Languages and History-Determinism in OCNs

We dedicate this section to tackling different questions about languages accepted by history-deterministic one-counter nets and decision problems on such languages.

### 4.1 Languages Accepted by History-Deterministic OCNs

While in history-deterministic models we are able to resolve the non-determinism on-the-fly, it is not well-understood how these resolvers might look like in general. In fact, Guha, Jecker, Lehtinen, and Zimmermann showed that there are history-deterministic pushdown automata whose resolvers cannot be given by a pushdown automata [18], and whether such a resolver can be computed is an open problem.

In this sub-section, our goal is to understand better the languages of historydeterministic OCNs. As a first-step towards this goal, we already have some intuition from the previous section on the eventually periodic nature of the transitions that are residual (as a corollary of Lemma 11 and Theorem 12). Here, we solidify this intuition by defining what it means to have semilinear-strategy property for a resolver and to then show that all nets have this property. For the case of history-deterministic nets, using this semi-linearity of the resolvers, we show the existence of a language-equivalent deterministic OCA.

We first show a sufficient characterisation which we call the semilinearstrategy property, for if a given history-deterministic one-counter net can be determinised.

We say a transition δ = (p, a, d, p- ) in an one-counter net N is a good transition at (p, k), if ((p, k),(p, k)) is in the winning region of G1, and the transition <sup>δ</sup> = (p, k) a,d −−- (p- , k +d) is a winning move for ∃ in G<sup>1</sup> when ∀ chooses the letter <sup>a</sup>. We also write this sometimes as (p, k) a,d −−- (p- , k + d) is a good transition in N . The following lemma can be seen as a weakening of Proposition 7.

Lemma 15. Let N = (Q, Σ, Δ, q0, F) be a history-deterministic one-counter net. An ∃ strategy σ in the letter game is winning for ∃ if and only if the strategy <sup>σ</sup> only takes good transitions <sup>δ</sup> = (p, k) a,d −−- (p- , k- ).

Given a one-counter net <sup>N</sup> , we say <sup>N</sup> satisfies *semilinear-strategy property* if for each transition δ = (q, a, d, q- ), the set of <sup>k</sup> <sup>∈</sup> <sup>N</sup> such that <sup>δ</sup> is a good transition at (q, k) is semilinear. That is, for each transition δ = (q, a, d, q- ) ∈ Δ, we have that the following set is eventually periodic

> S<sup>δ</sup> = - k : (q, k) a,d −−- (q- , k- ) is a good transition at (q, k) .

Lemma 16. *If a history-deterministic OCN* <sup>N</sup> = (Q, Σ, q0, Δ, F) *satisfies the semilinear-strategy property, then there is a language-equivalent deterministic OCA* <sup>D</sup>*.*

*Proof (Sketch).* We assume the history-deterministic OCN <sup>N</sup> is such that it satisfies semilinear-strategy property. We shall first construct a non-deterministic one-counter automata B, which can be determinised easily by removing a minimal set of transitions to get rid of non-determinism while still preserving the language. The non-deterministic one-counter automata B would essentially be designed so that the transitions in B correspond to the good transitions in N , from any configuration. The eventual periodicity of the sets S<sup>δ</sup> allows us to express this as a one-counter automaton, rather than as a labelled transition system with countably many states.

Intuitively, the automaton B is constructed such that the state space of the automaton stores in its memory the period and the initial block of the semilinear sets. The idea is that this automaton's runs would be in bijection with those runs that take only good transitions in the OCN N . We know that such a run exists in N by Lemma 15, as N is history-deterministic. However, the counter values in B are 'scaled down' to only remember how many periods have passed, while counter value 0 indicates that the counter value in the original run would have been at most I. The exact value of the counter in a run of N can be inferred as a function of the state space and the counter value of B.

Having shown that every history-deterministic one-counter net that satisfies semilinear-strategy property has a language equivalent DOCA, we proceed to show that every one-counter net satisfies semilinear-strategy property. We first display an example which solidifies an intuition of the above statement.

*Example 17.* Consider the net <sup>N</sup>7, as shown in Figure 3, where all states labelled q<sup>F</sup> are accepting. This automaton is not history-deterministic. However, if the counter value at q<sup>1</sup> is not a multiple of 7, then ∃ can resolve the non-determinism from <sup>q</sup>1. Observe that the automaton accepts words of the form <sup>a</sup><sup>n</sup>\$b<sup>k</sup>\$ · (♥, ♣) such that k ≤ n. Consider the following play of ∀ in the letter game from q0: For 7n steps he reads a, after which he reads a \$. So far, all transitions are deterministic. After that, assume he again reads, 7n many times, the letter b. This ensures that the transition ends at the state q<sup>1</sup> with counter value 0. If he reads \$ here, this is the only position where ∃ has a choice. Note that she has to choose between transitions leading to q♥ and q♣. However, since both the suffixes ♥ and ♣ are accepting and only one of ♥ or ♣ is accepting from either states, ∀ can ensure ∃ loses no matter what she picks. However, if ∀ had read a

Fig. 3. The one-counter net <sup>N</sup><sup>7</sup> from Example <sup>17</sup>

number of 'b's that was not a multiple of 7, the play of an accepting word would end at q\$ which is accepting.

Lemma 18. *Every one-counter net* N *satisfies semilinear-strategy property.*

The proof of the above lemma follows from Theorem 12 on using a construction similar to the proof of Lemma 11 along wit. As an easy corollary of the above two lemmas, we get the following theorem.

Theorem 19. *Every history-deterministic OCN can be determinised to produce an equivalent deterministic OCA.*

An easy analysis of our proof combined with the results on the representation of simulation preorder ([24], Lemma 28) shows a doubly exponential upper bound on the size of the equivalent deterministic OCA constructed from the proof of the theorem above.

*Remark 20.* On the topic of expressivity of history-determinism, we conclude this subsection with a remark that history-deterministic OCNs are strictly less expressive than non-deterministic OCNs. This can be demonstrated with the language <sup>L</sup> <sup>=</sup> {a<sup>i</sup> \$b<sup>j</sup>\$b<sup>k</sup> <sup>|</sup> <sup>j</sup> <sup>≤</sup> <sup>i</sup> or <sup>k</sup> <sup>≤</sup> <sup>i</sup>}. It is routine to verify that such a language is not accepted by any history-deterministic OCN, but this language can be accepted by a non-deterministic OCN. Note that history-determinism itself is not the limiting factor in accepting this language, as this language is accepted by a history-deterministic pushdown automaton [18].

### 4.2 Complexity of comparing languages of history-deterministic OCNs

Comparisons between languages of non-deterministic OCNs are undecidable [23], and even the restricted question of universality, is Ackermann-complete [33]. In this section, we show that for history-deterministic nets, these problems are no longer undecidable and have a significantly lower complexity when compared to non-deterministic nets.

Although we show that history-deterministic OCNs can be converted to a deterministic automaton, this determinisation does not help us answer these questions. This is because for deterministic OCAs, the problem of inclusion is undecidable [28]. Even though equality and universality for a deterministic OCA is **NL**-complete [34], the resulting deterministic OCA we get from determinisation of history-deterministic OCNs could be much larger than our input net, leading to a larger complexity.

Nevertheless, we show that checking for language inclusion, and hence checking language equivalence between two history-deterministic one-counter nets is in **PSPACE**. This is done by giving a polynomial-time reduction to the problem of deciding history-determinism, which we showed to be in **PSPACE** in Lemma 13. Moreover, combining our techniques with results of Kucera [35] gives us decidability in **P** for checking language universality of HD-OCNs.

Lemma 21. *Deciding language inclusion and language equivalence between two history-deterministic one-counter nets is in* **PSPACE***.*

We can show that the problem of checking language inclusion between two history-deterministic OCNs reduces to checking if a larger OCN (linear in the sum of the size of the two OCNs) is history-deterministic. Since language equivalence is essentially checking language inclusion both ways, we have the above results.

Lemma 22. *Deciding language universality for a given history-deterministic one-counter net is in* **P***.*

The problem of universality reduces to checking if the input net M simulates a finite state automata. This problem was shown to be **P** by Kucera ([35], Lemma 2), showing that universality is in **P**.

We therefore have the following theorem.

Theorem 23. *For nets* H *and* H *that are history-deterministic, the problem of checking if* L(H) ⊆ L(H- ) *as well as checking if* L(H) = L(H- ) *can be done in* **PSPACE***. If* <sup>H</sup> *is instead a deterministic finite-state automaton, this problem can be solved in* **P***.*

We summarise known results and complexity of relevant results for comparison with other automata models in Table 1.

## 5 Extensions and Variations of OCN

We revisit the question of deciding history-determinism in this section for onecounter nets and its variants. In the first subsection, we tackle the question of how the complexity changes if the nets are encoded succinctly. We show that as expected, this increases the complexity of the problem from **PSPACE**-complete to **EXPSPACE**-complete. We then answer affirmatively to the question of whether adding zero-tests add too much power to one-counter nets by showing that the problem of deciding history-determinism becomes undecidable.

On History-Deterministic One-Counter Nets 233


Table 1. Complexities for the problems of deciding language inclusion, equivalence and universality over deterministic OCN, HD-OCN, non-deterministic OCN and deterministic OCA.

#### 5.1 Succinct Encoding of Counters

We consider a succinct representation of the input nets or a succinct one-counter net, where the transitions can allow for increments or decrements by integers (potentially greater than 1) that are represented in binary. Unsurprisingly, we show that checking for history-determinism becomes **EXPSPACE**-complete in this case. The upper bound follows from the previous proof of the **PSPACE** upper bound from Lemma 13 of deciding history-determinism for one-counter nets, where counter values are in unary. Any succinct one-counter net can be converted with only an exponential blow-up into another language equivalent net with unary encoding, preserving history-determinism thereby giving us an **EXPSPACE** upper bound.

Proposition 24. Given a succinct one-counter net <sup>N</sup> ,deciding if <sup>N</sup> is historydeterministic is in **EXPSPACE**.

However, much more work is needed to show a matching lower bound, which we do by giving a reduction from reachability games on succinct one-counter nets (SOCN). Intuitively, these games are played on the configuration graphs of a succinct OCN whose alphabet is a singleton. The states of this SOCN are partitioned among two players, denoted by ∧ and ∨ who are responsible for choosing the transition from that state. The goal of ∨ is to take the play to a designated winning state with value 0. This problem of deciding the winner in the SOCN-reachability game was shown to be **EXPSPACE**-complete by Hunter [37] and later, several of its variants were also shown to have the same complexity [30]. A polynomial reduction from checking for history-determinism in a SOCN gives us **EXPSPACE**-hardness.

## Lemma 25. Given a SOCN <sup>N</sup> , deciding if <sup>N</sup> is history-deterministic is **EXPSPACE**-hard.

Proof (Sketch). Given an instance of a SOCN-reachability game on <sup>N</sup> , We construct a SOCN M such that ∨ wins in the SOCN-reachability game on N if and only if ∀ wins in the letter game on M.

The high-level idea of the construction is such that in a play of the letter game on M, the players ∀ and ∃ create a transcript of a run of N . This is done by ∀ ensuring that picking the letters in M corresponds to picking a transition out of ∨ states in N . Since ∃ resolves the non-determinism in the letter game on M, her choice of transitions correspond to transitions out of a ∧ state in the SOCN-reachability game.

However, there are some subtleties in the construction as we need to ensure a few important aspects while constructing M. Firstly, any sequence of letters chosen by ∀ in M's letter game so far must correspond to a run in N and secondly, the interplay between ∃'s and ∀'s choices in the letter game of M must correspond to the choices of the player ∧ and ∨ respectively in the SOCNreachability game of N . These are the main challenges while constructing such an OCN N and they are resolved by the use of a few gadgets.

We conclude this subsection by combining Proposition 24 and Lemma 25 to obtain the following theorem.

Theorem 26. *Given a SOCN* N *, deciding if* N *is history-deterministic is* **EXPSPACE***-complete.*

## 5.2 Deciding History-Determinsm for OCA

We show that, given a one-counter automaton A, deciding if A is historydeterministic is undecidable. It was shown by Guha, Jecker, Lehtinen, and Zimmermann [18] that deciding if a given non-deterministic pushdown automaton is history-deterministic is undecidable. This extends their result to OCAs, which follows via a reduction from checking for language inclusion for deterministic one-counter automata (DOCA), which is known to be undecidable [28].

Theorem 27. *Given an OCA* A*, deciding if* A *is history-deterministic is undecidable.*

*Proof (Sketch).* Consider the following problem :

*DOCA Inclusion:* Given two DOCAs A and B, is L(A) ⊆ L(B)?

The above problem was shown to be undecidable in Section 5.1 of Valiant's thesis [28]. We show that the problem of deciding if a given one-counter automaton is history-deterministic is also undecidable, by giving a reduction from the DOCA inclusion problem to checking for history-determinism of a given OCA.

## 6 Discussion

We showed several decision problems related to history-determinism to be decidable over OCNs. This is unlike other classes of infinite-state systems that subsume them, where either some or all of these problems are undecidable.

We note that we only deal with realtime nets with no --transitions, but our results hold without too much modification when --transitions are present, as weak simulation over OCNs can be decided in **PSPACE** (and **EXPSPACE** for a succinct encoding), and the weak simulation pre-order is semilinear as well [24]. We also showed that testing the counter for zero made checking for historydeterminism undecidable. Along these lines, one could ask about models like reversal bounded one-counter automata [38], or automata with bounded number of zero-tests, to gauge the frontier between decidability and undecidability on these systems.

Although not obvious from the main part of the paper, we are confident that our results could easily be extended to safety acceptance conditions. One could also ask, for instance, to look at reachability or Büchi and co-Büchi acceptance conditions and understand how history-determinism works in these models.

There are several questions about the expressivity of history-deterministic OCNs which we believe need further study. Overloading the notation and assuming DOCN, DOCA, OCN, HD-OCN and HD-OCA to denote the class of languages that are accepted by the corresponding models, we have shown that

$$\text{DOCN} \subseteq \text{HD-OCN} \subseteq \text{OCN} \cap \text{DOCA}.$$

An interesting problem would be to prove or disprove if any of these inclusions are strict. In fact, we don't have an example of a language that is accepted by a history-deterministic OCN which is not accepted by a deterministic OCN.

One could ask similar questions about expressivity of history-determinism in OCAs, i.e. if HD-OCA = DOCA. Although deciding history-determinism is undecidable, it might be possible for one to show that the language accepted by a history-deterministic OCA is as expressive as deterministic OCA. We remark that the 1-token game G<sup>1</sup> characterises history-determinisation for OCAs as well. Moreover, we can again show with similar techniques that if history-deterministic OCAs satisfy the semilinear-strategy property, then these languages can also be expressed by a deterministic OCA. The key part that we need to prove for determinisation of history-deterministic OCA would be the semilinear-strategy property. It would be interesting to see how such a proof would look like, given that checking for history-determinism is undecidable for OCAs.

Acknowledgements We would like to thank Dmitry Chistikov for listening to our conjectures and pointing us to important references. We are also grateful for his comments on our introduction. We are thankful to Neha Rino for carefully proofreading our paper, and suggesting improvements in our presentation. We also thank Sougata Bose, Piotrek Hofman, Filip Mazowiecki, David Purser, and Patrick Totzke for their insightful remarks on our draft, and for telling us about weak simulation. We are grateful to Shaull Almagor and Asaf Yeshurun for a fun talk about OCNs. Finally, we thank Marcin Jurdziński for his support, and for bringing us homemade rhubarb crumble.

## References

1. Thomas Colcombet. Unambiguity in automata theory. In Jeffrey O. Shallit and Alexander Okhotin, editors, *Descriptional Complexity of Formal Systems - 17th In-* *ternational Workshop, DCFS 2015, Waterloo, ON, Canada, June 25-27, 2015. Proceedings*, volume 9118 of *Lecture Notes in Computer Science*, pages 3–18. Springer, 2015.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Unboundedness Problems for Machines with Reversal-Bounded Counters

Pascal Baumann1() , Flavio D'Alessandro2, Moses Ganardi<sup>1</sup> , Oscar Ibarra3, Ian McQuillan4, Lia Schütze<sup>1</sup> , and Georg Zetzsche<sup>1</sup> ()

<sup>1</sup> Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern and Saarbrücken, Germany

<sup>2</sup> Dept. of Mathematics G. Castelnuovo, Sapienza University of Rome, Rome, Italy

<sup>3</sup> Dept. of Computer Science, University of California, Santa Barbara, CA, USA

<sup>4</sup> Dept. of Computer Science, University of Saskatchewan, Saskatoon, Canada

Abstract. We consider a general class of decision problems concerning formal languages, called "(one-dimensional) unboundedness predicates", for automata that feature reversal-bounded counters (RBCA). We show that each problem in this class reduces—non-deterministically in polynomial time—to the same problem for just finite automata. We also show an analogous reduction for automata that have access to both a pushdown stack and reversal-bounded counters (PRBCA).

This allows us to answer several open questions: For example, we show that it is coNP-complete to decide whether a given (P)RBCA language L is bounded, meaning whether there exist words w1,...,w<sup>n</sup> with L ⊆ w<sup>∗</sup> <sup>1</sup> ··· w<sup>∗</sup> <sup>n</sup>. For PRBCA, even decidability was open. Our methods also show that there is no language of a (P)RBCA of intermediate growth. This means, the number of words of each length grows either polynomially or exponentially. Part of our proof is likely of independent interest: We show that one can translate an RBCA into a machine with Z-counters in logarithmic space, while preserving the accepted language.

Keywords: Formal languages · Decidability · Complexity · Counter automata · Reversal-bounded · Pushdown · Boundedness · Unboundedness

## 1 Introduction

A classic idea in the theory of formal languages is the concept of boundedness of a language. A language L over an alphabet Σ is called bounded if there exists a number <sup>n</sup> <sup>∈</sup> <sup>N</sup> and words <sup>w</sup>1,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> such that <sup>L</sup> <sup>⊆</sup> <sup>w</sup><sup>∗</sup> <sup>1</sup> ··· w<sup>∗</sup> n. What makes boundedness important is that a rich variety of algorithmic problems become decidable for bounded languages. For example, when Ginsburg and Spanier [25] introduced boundedness in 1964, they already showed that given two context-free languages, one of them bounded, one can decide inclusion [25, Theorem 6.3]. This is because if L ⊆ w<sup>∗</sup> <sup>1</sup> ··· w<sup>∗</sup> <sup>n</sup> for a context-free language, then the set {(x1,...,xn) <sup>∈</sup> <sup>N</sup><sup>n</sup> <sup>|</sup> <sup>w</sup><sup>x</sup><sup>1</sup> <sup>1</sup> ··· <sup>w</sup><sup>x</sup><sup>n</sup> <sup>n</sup> ∈ L} is effectively semilinear, which permits expressing inclusion in Presburger arithmetic. Here, boundedness is a crucial assumption: Hopcroft has shown that if L<sup>0</sup> ⊆ Σ<sup>∗</sup> is context-free, then the problem of deciding L<sup>0</sup> ⊆ L for a given context-free language L is decidable if and only if L<sup>0</sup> is bounded [35, Theorem 3.3].

The idea of translating questions about bounded languages into Presburger arithmetic has been applied in several other contexts. For example, Esparza, Ganty, and Majumdar [20] have shown that many classes of infinite-state systems are perfect modulo bounded languages, meaning that the bounded languages form a subclass that is amenable to many algorithmic problems. As another example, the subword ordering has a decidable first-order theory on bounded contextfree languages [45], whereas on languages Σ∗, even the existential theory is undecidable [33]. This, in turn, implies that initial limit Datalog is decidable for the subword ordering on bounded context-free languages [7]. Finally, bounded context-free languages can be closely approximated by regular ones [16].

This raises the question of how one can decide whether a given language is bounded. For context-free languages this problem is decidable [25, Theorem 5.2(a)] in polynomial time [24, Theorem 19].

Boundedness for RBCA. Despite the importance of boundedness, it had been open for many years [9, 17] <sup>1</sup> whether boundedness is decidable for one of the most well-studied types of infinite-state systems: reversal-bounded (multi-)counter automata (RBCA). These are machines with counters that can be incremented, decremented, and even tested for zero. However, in order to achieve decidability of basic questions, there is a bound on the number of times each counter can reverse, that is, switch between incrementing and decrementing phases. They were first studied in the 1970s [2, 36] and have received a lot of attention since [8– 13, 18, 23, 28, 32, 33, 39–41, 58]. The desirable properties mentioned above for bounded context-free languages also apply to bounded RBCA. Furthermore, any bounded language accepted by an RBCA (even one augmented with a stack) can be effectively determinized [38] (see also [9, 11]), opening up even more avenues to algorithmic analysis. This makes it surprising that decidability of boundedness remained open for many years.

Decidability of boundedness for RBCA was settled in [15], which proves boundedness decidable even for the larger class of vector addition systems with states (VASS), with acceptance by configuration. However, the results from [15] leave several aspects unclarified, which we investigate here:


<sup>1</sup> Note that [9] is about Parikh automata, which are equivalent to RBCA.

Q3: Are there languages of RBCA of intermediate growth? As far as we know, this is a long-standing open question in itself [37]. The growth of a language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> is the counting function <sup>g</sup><sup>L</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup>, where <sup>g</sup>L(n) is the number of words of length n in L. This concept is closely tied to boundedness: For regular and context-free languages, it is known that a language has polynomial growth if and only if it is bounded (and it has exponential growth otherwise). A language is said to have intermediate growth if it has neither polynomial nor exponential growth.

Contribution I: We prove versions of one of the main results in [15], one for RBCA and one for PRBCA. Specifically, the paper [15] not only shows that boundedness is decidable for VASS, but it introduces a general class of unboundedness predicates for formal languages. It is then shown in [15] that any unboundedness predicate is decidable for VASS if and only if it is decidable for regular languages. Our first two main results are:


However, it should be noted that our results only apply to those unboundedness predicates from [15] that are one-dimensional. Fortunately, these are enough for our applications. These results allow us to settle questions (Q1)–(Q3) above and derive the exact complexity of several other problems. It follows that boundedness for both RBCA and PRBCA is coNP-complete, thus answering (Q1) and (Q2). Furthermore, the proof shows that if boundedness of a PRBCA does not hold, then its language has exponential growth. This implies that there are no RBCA languages of intermediate growth (thus settling (Q3)), and even that the same holds for PRBCA. In particular, deciding polynomial growth of (P)RBCA is coNP-complete and deciding exponential growth of (P)RBCA is NP-complete. We can also derive from our result that deciding whether a (P)RBCA language is infinite is NP-complete (but this also follows easily from [32], see Section 2). Finally, our results imply that it is PSPACE-complete to decide if an RBCA language L ⊆ Σ<sup>∗</sup> is factor universal, meaning it contains every word of Σ<sup>∗</sup> as a factor (i.e. as an infix). Whether this problem is decidable for RBCA was also left as an open problem in [17, 18] (under the name infix density).

We prove our results (MR1) and (MR2) by first translating (P)RBCA into models that have Z-counters instead of reversal-bounded counters. A Z-counter is one that can be incremented and decremented, but cannot be tested for zero. Moreover, it can assume negative values. With these counters, acceptance is defined by reaching a configuration where all counters are zero (in particular, the acceptance condition permits a single zero-test on each counter). Here, finite automata with Z-counters are called Z-VASS [29]. Z-counters are also known as blind counters [26] and it is a standard fact that RBCA are equivalent (in terms of accepted languages) to Z-VASS [26, Theorem 2].


Table 1. Complexity results. The completeness statements are meant with respect to deterministic logspace reductions.

Despite the equivalence between RBCA and Z-VASS being so well-known, there was apparently no known translation from RBCA to Z-VASS in polynomial time. Here, the difficulty stems from simulating zero-tests (which can occur an unbounded number of times in an RBCA): To simulate these, the Z-VASS needs to keep track of which counter has completed which incrementing/decrementing phase, using only polynomially many control states. It is also not obvious how to employ the Z-counters for this, as they are only checked in the end.

Contribution II: As the first step of showing (MR1), we show that

MR3: RBCA can be translated (preserving the language) into Z-VASS in logarithmic space.

This also implies that translations to and from another equivalent model, Parikh automata [41], are possible in polynomial time: It was recently shown that Parikh automata (which have received much attention in recent years [6, 8–10, 13, 22]) can be translated in polynomial time into Z-VASS [30]. Together with our new result, this implies that one can translate among RBCA, Z-VASS, and Parikh automata in polynomial time. Furthermore, our result yields a logspace translation of PRBCA into Z-grammars, an extension of context-free grammars with Z-counters. The latter is the first step for (MR2).

## 2 Main Results: Unboundedness and (P)RBCA

Reversal-bounded counter automata and pushdowns. A pushdown automaton with k counters is a tuple A = (Q, Σ, Γ, q0,T,F) where Q is a finite set of states, Σ is an input alphabet, Γ is a stack alphabet, q<sup>0</sup> ∈ Q is an initial state, T is a finite set of transitions (p, w, op, q) ∈ Q × Σ<sup>∗</sup> × Op × Q, and F ⊆ Q is a set of final states. Here Op is defined as

$$\text{Op} = \{ \mathsf{inc}\_i, \mathsf{dec}\_i, \mathsf{zero}\_i, \mathsf{nz}\_i \mid 1 \le i \le k \} \cup \Gamma \cup \bar{\Gamma} \cup \{ \varepsilon \},$$

containing counter and stack operations. Here <sup>Γ</sup>¯ <sup>=</sup> {γ¯ <sup>|</sup> <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>} is a disjoint copy of <sup>Γ</sup>. A configuration is a tuple (p, α, *<sup>v</sup>*) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Γ</sup><sup>∗</sup> <sup>×</sup> <sup>N</sup><sup>k</sup>. We write (p, α,*u*) <sup>w</sup> −→ (p , α ,*u* ) if there is a (p, w, op, p ) ∈ T such that one of the following holds:

– op = inci, *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>* <sup>+</sup> *<sup>e</sup>*i, and <sup>α</sup> <sup>=</sup> <sup>α</sup> where *<sup>e</sup>*<sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>k</sup> is the <sup>i</sup>-th unit vector,


We extend this notation to longer runs in the natural way.

A (k, r)-PRBCA (pushdown reversal-bounded counter automaton) (A, r) consists of a pushdown automaton with <sup>k</sup> counters <sup>A</sup> and a number <sup>r</sup> <sup>∈</sup> <sup>N</sup>, encoded in unary. A counter c<sup>i</sup> reverses if the last (non-test) operation affecting it was inc<sup>i</sup> and the next operation is deci, or vice versa. A run is r-reversal bounded if every counter reverses at most r times. The language of (A, r) is

<sup>L</sup>(A, r) = {<sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> | ∃<sup>q</sup> <sup>∈</sup> F, r-reversal bounded run (q0, ε, **<sup>0</sup>**) <sup>w</sup> −→ (q, ε, **0**)}.

A (k, r)-RBCA (reversal-bounded counter automaton) is a (k, r)-PRBCA where A only uses counter operations. We denote by RBCA and PRBCA the class of (P)RBCA languages.

Notice that we impose the reversal bound externally (following [32]) whereas in alternative definitions found in the literature the automaton has to ensure internally that the number of reversals on every (accepting) run does not exceed r, e.g. [36]. Clearly, our definition subsumes the latter one; in particular, Theorem 1 also holds for (P)RBCAs with an internally checked reversal bound.

A d-dimensional Z-VASS (Z-vector addition system with states) is a tuple V = (Q, Σ, q0,T,F), where Q is a finite set of states, Σ is an alphabet, q<sup>0</sup> ∈ Q is an initial state, T is a finite set of transitions (p, w, *v*, p ) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> <sup>×</sup> <sup>Z</sup><sup>d</sup> <sup>×</sup> <sup>Q</sup>, and <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is a set of final states. A configuration of a <sup>Z</sup>-VASS is a tuple (p, *<sup>v</sup>*) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Z</sup><sup>d</sup>. We write (p,*u*) <sup>w</sup> −→ (p ,*u* ) if there is a transition (p, w, *v*, p ) such that *u* = *u*+*v*. We extend this notation to longer runs in the natural way. The language of the Z-VASS is defined as

$$L(\mathcal{V}) = \{ w \in \Sigma^\* \mid \exists q \in F \colon (q\_0, \mathbf{0}) \xrightarrow{w} (q, \mathbf{0}) \}.$$

A (d-dimensional) Z-grammar is a tuple G = (N, Σ, S, P) with disjoint finite sets N and Σ of nonterminal and terminal symbols, a start nonterminal S ∈ N, and a finite set of productions <sup>P</sup> of the form (A, u, *<sup>v</sup>*) <sup>∈</sup> <sup>N</sup> <sup>×</sup> (<sup>N</sup> <sup>∪</sup> <sup>Σ</sup>)<sup>∗</sup> <sup>×</sup> <sup>Z</sup><sup>d</sup>. We also write (A → u, *v*) instead of (A, u, *v*). We call *v* the (counter) effect of the production (<sup>A</sup> <sup>→</sup> u, *<sup>v</sup>*). For words x, y <sup>∈</sup> (<sup>N</sup> <sup>∪</sup> <sup>Σ</sup>)∗, we write <sup>x</sup> <sup>⇒</sup>*<sup>v</sup>* <sup>y</sup> if there is a production (A → u, *v*) such that x = rAs and y = rus. Moreover, we write <sup>x</sup> <sup>⇒</sup><sup>∗</sup> *<sup>v</sup>* <sup>y</sup> if there are words <sup>x</sup>1,...,x<sup>n</sup> <sup>∈</sup> (<sup>N</sup> <sup>∪</sup> <sup>Σ</sup>)<sup>∗</sup> and *<sup>v</sup>*1,..., *<sup>v</sup>*<sup>n</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> with <sup>x</sup> <sup>⇒</sup>*<sup>v</sup>*<sup>1</sup> <sup>x</sup><sup>1</sup> <sup>⇒</sup>*<sup>v</sup>*<sup>2</sup> ··· ⇒*<sup>v</sup>*<sup>n</sup> <sup>x</sup><sup>n</sup> <sup>=</sup> <sup>y</sup> and *<sup>v</sup>* <sup>=</sup> *<sup>v</sup>*<sup>1</sup> <sup>+</sup>···+*v*n. We use the notation <sup>⇒</sup> if the counter effects do not matter: We have x ⇒ y if there exists *v* such that x ⇒*<sup>v</sup>* y; and similarly for ⇒<sup>∗</sup> . If derivations are restricted to a subset Q ⊆ P of productions, we write ⇒<sup>Q</sup> (resp. ⇒<sup>∗</sup> <sup>Q</sup> ).

The language of the <sup>Z</sup>-grammar <sup>G</sup> is the set of all words <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> such that <sup>S</sup> <sup>⇒</sup><sup>∗</sup> **<sup>0</sup>** <sup>w</sup>. In other words, if there exists a derivation <sup>S</sup> <sup>⇒</sup><sup>∗</sup> <sup>w</sup> where the effects of all occurring productions sum to the zero vector **0**. Z-grammars of dimension d are also known as valence grammars over Z<sup>d</sup> [21].

For our purposes it suffices to assume a unary encoding of the Zd-vectors (effects) occurring in Z-VASS and Z-grammars. However, this is not a restriction: Counter updates with n-bit binary encoded numbers can be easily simulated with unary encodings at the expense of dn many fresh counters (see the full version [5]).

Conversion results. The following is our first main theorem:

Theorem 1. RBCA can be converted into Z-VASS in logarithmic space. PRBCA can be converted into Z-grammars in logarithmic space.

By convert, we mean a translation that preserves the accepted (resp. generated) language. There are several machine models that are equivalent (in terms of accepted languages) to RBCA. With Theorem 1, we provide the last missing translation:

Corollary 1. The following models can be converted into each other in logarithmic space: (i) RBCA, (ii) <sup>Z</sup>-VASS, (iii) Parikh automata with <sup>∃</sup>PA acceptance, and (iv) Parikh automata with semilinear acceptance.

Roughly speaking, a Parikh automaton is a machine with counters that can only be incremented. Then, a run is accepting if the final counter values belong to some semilinear set. Parikh automata were introduced by Klaedtke and Rueß [41], where the acceptance condition is specified using a semilinear representation (with base and period vectors), yielding (iv) above. As done, e.g., in [33], one could also specify it using an existential Presburger formula (briefly ∃PA), yielding the model in (iii) above. Theorem 1 proves (i)⇒(ii), whereas (ii)⇒(i) is easy (a clever and very efficient translation is given in [40, Theorem 4.5]). Moreover, (ii)⇒(iii) and (ii)⇒(iv) are clear as well. For (iii)⇒(ii), one can proceed as in [30, Prop. V.1], and (iv)⇒(ii) is also simple.

Unboundedness predicates. We shall use Theorem 1 to prove our second main theorem, which involves unboundedness predicates as introduced in [15]. In [15], unboundedness predicates can be one-dimensional or multi-dimensional, but in this work, we only consider one-dimensional unboundedness predicates.

Let Σ be an alphabet. A (language) predicate is a set of languages over Σ. If p is a predicate and L ⊆ Σ<sup>∗</sup> is a language, then we write p(L) to denote that p holds for the language L (i.e. L ∈ p). A predicate p is called a (one-dimensional) unboundedness predicate if the following conditions are met for all K, L ⊆ Σ∗:

(U1) If p(K) and K ⊆ L, then p(L). (U3) If p(K · L), then p(K) or p(L).

$$\text{(U2)}\text{ If }\mathfrak{p}(K\cup L)\text{, then }\mathfrak{p}(K)\text{ or }\mathfrak{p}(L). \quad \text{(U4)}\text{ }\mathfrak{p}(L)\text{ if and only if }\mathfrak{p}(F(L)).$$

Here F(L) = {v ∈ Σ<sup>∗</sup> | ∃u, w ∈ Σ<sup>∗</sup> : uvw ∈ L} is the set of factors of L (sometimes also called infixes). In particular, the last condition says that p only depends on the set of factors occurring in a language.

For an unboundedness predicate p and a class C of finitely represented languages (such as automata or grammars), let p(C) denote the problem of deciding p for a given language L from C. Formally, p(C) is the following decision problem:

## Given A language L from C. Question Does p(L) hold?

For example, p(RBCA) is the problem of deciding p for reversal-bounded multicounter automata and p(NFA) is the problem of deciding p for NFAs. We mention that the axioms (U1)–(U4) are slightly stronger than the axioms used in [15], but the resulting set of decision problems is the same with either definition (since in [15], one always decides whether p(F(L)) holds). Thus, the statement of Theorem 2 is unaffected by which definition is used. See the full version [5] for details.

The following examples of (one-dimensional) unboundedness predicates for languages L ⊆ Σ<sup>∗</sup> have already been established in [15]. We mention them here to give an intuition for the range of applications of our results:

Not being bounded Let pnotb(L) if and only if L is not a bounded language. Non-emptiness Let p=∅(L) if and only if L = ∅. Infinity Let p∞(L) if and only if L is infinite. Factor-universality Let pfuni(L) if and only if Σ<sup>∗</sup> ⊆ F(L).

It is not difficult to prove that these are unboundedness predicates, but proofs can be found in [15]. The following is our second main theorem:

Theorem 2. Let p be a one-dimensional unboundedness predicate. There is an NP reduction from p(PRBCA) to p(PDA). Moreover, there is an NP reduction from p(RBCA) to p(NFA).

Here, an NP reduction from problem A ⊆ Σ<sup>∗</sup> to B ⊆ Σ<sup>∗</sup> is a non-deterministic polynomial-time Turing machine such that for every input word w ∈ Σ∗, we have w ∈ A iff there exists a run of the Turing machine producing a word in B.

Let us now see some applications of Theorem 2, see also Table 1. The following completeness results are all meant w.r.t. deterministic logspace reductions.

## Corollary 2. Boundedness for PRBCA and for RBCA is coNP-complete.

For Corollary 2, we argue that deciding non-boundedness is NP-complete. To this end, we apply Theorem 2 to the predicate pnotb and obtain an NP upper bound, because boundedness for context-free languages is decidable in polynomial time [24]. The NP lower bound follows easily from NP-hardness of the non-emptiness problem for RBCA [28, Theorem 3] and thus PRBCA.

Corollary 3. Finiteness for PRBCA and for RBCA is coNP-complete.

We show Corollary 3 by proving that checking infinity is NP-complete. The upper bound follows from Theorem 2 via the predicate p∞. As above, NP-hardness is inherited from the non-emptiness problem for RBCA and PRBCA.

The results in Corollary 3 are, however, not new. They follow directly from the fact that for a given PRBCA (or RBCA), one can construct in polynomial time a formula in existential Presburger arithmetic (∃PA) for its Parikh image, as shown in [36] for RBCA and in [32] for PRBCA. It is a standard result about ∃PA that for each formula ϕ, there exists a bound B such that (i) B is at most exponential in the size of ϕ and (ii) ϕ defines an infinite set if and only if ϕ is satisfied for some vector with some entry above B. For example, this can be deduced from [53]. Therefore, one can easily construct a second ∃PA formula ϕ such that ϕ defines an infinite set if and only if ϕ is satisfiable.

#### Corollary 4. Factor universality for RBCA is PSPACE-complete.

Whether factor universality is decidable for RBCA was left as an open problem in [17, 18] (there under the term infix density). Corollary 4 follows from Theorem 2 using pfuni, because factor universality for NFAs is PSPACE-complete: To decide if Σ<sup>∗</sup> ⊆ F(R), for a regular language R, we can just compute an automaton for F(R) and check inclusion in PSPACE. For the lower bound, one can reduce the PSPACE-complete universality problem for NFAs, since for R ⊆ Σ∗, the language (R#)<sup>∗</sup> ⊆ (Σ ∪ {#})<sup>∗</sup> is factor universal if only if R = Σ∗. Note that factor universality is known to be undecidable already for one-counter languages [18], and thus in particular for PRBCA. However, it is decidable for pushdown automata with a bounded number of reversals of the stack [18].

Beyond pushdowns. Theorem 2 raises the question of whether for any class M of machines, one can reduce any unboundedness predicates for M extended with reversal-bounded counters to the same predicate for just M. This is not the case: For example, consider second-order pushdown automata, short 2-PDA. If we extend these by adding reversal-bounded counters, then we obtain 2-PRBCA. Then, the infinity problem is decidable for 2-PDA [34] (see [3, 4, 14, 31, 52, 56] for stronger results). However, the class of 2-PRBCA does not even have decidable emptiness, let alone decidable infinity. This is shown in [57, Proposition 7] (see [42, Theorem 4] for an alternative proof). Thus, infinity for 2-PRBCA cannot be reduced to infinity for 2-PDA.

Growth. Finally, we employ the methods of the proof of Theorem 2 to show a dichotomy of the growth behavior of languages accepted by RBCA. For an alphabet Σ, we denote by Σ≤<sup>m</sup> the set of all words over Σ of length at most m. We say that a language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> has polynomial growth<sup>2</sup> if there is a polynomial <sup>p</sup>(x) such that <sup>|</sup>L∩Σ≤<sup>m</sup>| ≤ <sup>p</sup>(m) for all <sup>m</sup> <sup>≥</sup> <sup>0</sup>. Languages of polynomial growth are also called sparse or poly-slender. We say that L has exponential growth if there is a real number r > <sup>1</sup> such hat <sup>|</sup>L∩Σ≤<sup>m</sup>| ≥ <sup>r</sup><sup>m</sup> for infinitely many <sup>m</sup>. Since a language of the form w<sup>∗</sup> <sup>1</sup> ··· w<sup>∗</sup> <sup>n</sup> clearly has polynomial growth, it is well-known that bounded languages have polynomial growth. We show that (a) within the PRBCA languages (and in particular within the RBCA languages), the converse

<sup>2</sup> In [24], polynomial and exponential growth are defined with Σ<sup>m</sup> in place of Σ≤<sup>m</sup>, but this leads to equivalent notions, see the full version [5].

is true as well and (b) all other languages have exponential growth (in contrast to some models, such as 2-PDA [27], where this dichotomy does not hold):

Theorem 3. Let L be a language accepted by a PRBCA. Then L has polynomial growth if and only if L is bounded. If L is not bounded, it has exponential growth.

## 3 Translating reversal-bounded counters into Z-counters

Reducing the number of reversals to one. In this section we prove Theorem 1, the conversion from RBCA to Z-VASS. In [28, Lemma 1], it is claimed that given a (k, r)-RBCA, one can construct in time polynomial in k and r a (k (r + 1)/2, 1)-RBCA that accepts the same language. The reference [2] that they provide does include such a construction [2, proof of Theorem 5]. The construction in [2] is only a rough sketch and makes no claims about complexity, but by our reading of the construction, it keeps track of the reversals of each counter in the state, which would result in an exponential blow-up.

Instead, we proceed as follows. Consider a (k, r)-RBCA with counters c1,..., ck. Without loss of generality, assume r = 2m − 1. We will construct an equivalent (2k(r + 1), 1)-RBCA. Looking at the behavior of a single counter ci, we can decompose every r-reversal bounded run into subruns without reversals. We call these subruns phases and number them from 1 to at most 2m. The odd (even) numbered phases are positive (negative), where c<sup>i</sup> is only incremented (decremented). We replace c<sup>i</sup> by m one-reversal counters ci,<sup>1</sup>,..., ci,m, where ci,j records the increments on c<sup>i</sup> during the positive phase 2j − 1.

However, our machine needs to keep track of which counters are in which phase, in order to know which of the counters ci,j it currently has to use. We achieve this as follows: For each of the k counters ci, we also have an additional set of 2m = r + 1 "phase counters" pi,<sup>1</sup>,..., pi,2<sup>m</sup> to store which phase we are in. This gives km + k(r + 1) ≤ 2k(r + 1) counters in total. We encode that counter c<sup>i</sup> is in phase j by setting pi,j to 1 and setting pi,j to 0 for each j = j. Since we only ever increase the phase, the phase counters are one-reversal as well.

Using non-zero-tests, at any point, the automaton can nondeterministically guess and verify the current phase of each counter. This allows it to pick the correct counter ci,j for each instruction. When counter c<sup>i</sup> is in a positive phase 2j − 1, then increments and decrements on c<sup>i</sup> are simulated as follows:

### increment increment ci,j

decrement go into the next (negative) phase 2j; then non-deterministically pick some ∈ [1, j] and decrement ci,-. We cannot simply decrement ci,j as we might have switched to phase j while c<sup>i</sup> had a non-zero value and hence it is possible that c<sup>i</sup> could be decremented further than just ci,j allows.

When counter c<sup>i</sup> is in a negative phase 2j, then we simulate increments and decrements as follows:

increment go into the next phase 2j + 1 (unless j = m; then the machine blocks) and increment ci,j+1.

decrement non-deterministically pick some ∈ [1, j] and decrement ci,-.

Finally, to simulate a zero-test on ci, we test all counters ci,1,..., ci,m for zero, while for the simulation of a non-zero-test on c<sup>i</sup> we non-deterministically pick one of the counters ci,1,..., ci,m to test for non-zero.

Correctness can be easily verified by the following properties. If at some point c<sup>i</sup> is in phase 2j − 1 or 2j then (i) j -=1 ci,- = ci, (ii) the counters ci,1,..., ci,j have made at most one reversal, and (iii) the counters ci,j+1,..., ci,m have not been touched (in particular, they are zero). Furthermore, if c<sup>i</sup> is in a positive phase 2j − 1 then ci,j has made no reversal yet.

Note that this construction replaces every transition of the original system with O(r) new transitions (and states). Our construction therefore yields only a linear blowup in the size of the system (constant if r is fixed). See the full version [5] for the details of the construction.

From **1**-reversal to Z-counters. We now turn the (k, 1)-RBCA into a Z-VASS. The difference between a 1-reversal-bounded counter and a Z-counter is that (i) a non-negative counter should block if it is decremented on counter value 0, and (ii) a 1-reversal-bounded counter allows (non-)zero-tests. Observe that all zero-tests occur before the first increment or after the last decrement. All non-zero-tests occur between the first increment and the last decrement.

If the number k of counters is bounded, then the following simple solution works. The Z-VASS stores the information which of the counters has not been incremented yet and which counters will not be incremented again in the future. This information suffices to simulate the counters faithfully (in terms of the properties (i) and (ii) above) and increases the state space by a factor of <sup>2</sup><sup>k</sup> · <sup>2</sup><sup>k</sup>. The latter information needs to be guessed (by the automaton) and is verified by means that all counters are zero in the end.

In the general case we introduce a variant of Z-VASS that can guess polynomially many bits in the beginning and read them throughout the run. A d-dimensional Z-VASS with guessing (Z-VASSG) has almost the same format as a d-dimensional Z-VASS, except that each transition additionally carries a propositional formula over some finite set of variables X. A word w ∈ Σ<sup>∗</sup> is accepted by the <sup>Z</sup>-VASSG if there exists an assignment <sup>ν</sup> : <sup>X</sup> → {0, <sup>1</sup>} and an accepting run (q0, **0**) <sup>w</sup> −→ (q, **0**) for some q ∈ F such that all formulas appearing throughout the run are satisfied by ν.

We have to eliminate zero- and non-zero-tests of the (k, 1)-RBCA. Whether a (non-)zero-test is successful depends on which phase a counter is currently in (and whether in the end, every counter is zero; but we assume that our acceptance condition ensures this). Each counter goes through at most 4 phases:


2. the "increment phase", 4. after the last decrement.

Hence, every run can be decomposed into 4k (possibly empty) segments, in which no counter changes its phase. The idea is to guess the phase of each counter in each segment. Hence, we have propositional variables pi,j, for i ∈ [1, 4k], j ∈ [1, k], and ∈ [1, 4]. Then pi,j, is true iff in segment i, counter j is in phase . We will have to check that the assignment is admissible for each counter, meaning that the sequence of phases for each counter adheres to the order described above.

We modify the machine as follows. In its state, it keeps a number i ∈ [1, 4k] which holds the current segment. At the beginning of the run, the machine checks that the assignment ν is admissible using a propositional formula: It checks that (i) for each segment i and each counter j there exists exactly one phase so that pi,j, is true, and (ii) the order of phases above is obeyed. Then, for every operation on a counter, the machine checks that the operation is consistent with the current segment. Moreover, if the current operation warrants a change of the segment, then the segment counter i is incremented. For example, if a counter in phase 1 is incremented, it switches to phase 2 and the segment counter is incremented; or, if a counter in phase 3 is tested for zero, it switches to phase 4 and the segment counter is incremented.

With these modifications, we can zero-test by checking variables corresponding to the current segment: A zero-test can only succeed in phase 1 and 4. Similarly, for a non-zero-test, we can check if the counter is in phase 2 or 3.

Turning a Z-VASSG into a Z-VASS. To handle the general case mentioned above, we need to show how to convert Z-VASSG into ordinary Z-VASS. In a preparatory step, we ensure that each formula is a literal. A transition labeled by a formula ϕ is replaced by a series-parallel graph: After bringing ϕ in negation normal form by pushing negations inwards, we can replace conjunctions by a series composition and disjunctions by a parallel composition (non-determinism).

The Z-VASS works as follows. In addition to the original counters of the Z-VASSG, it has for each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> two additional counters: <sup>x</sup><sup>+</sup> and <sup>x</sup>−. Here, x<sup>+</sup> (x−) counts how many times x is read with a positive (negative) assignment. By making sure that either x<sup>+</sup> = 0 or x<sup>−</sup> = 0 in the end, we guarantee that we always read the same value of x.

Thus, in order to check a literal, our Z-VASS increments the corresponding counter. In the end, before reaching a final state, it goes through each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> and either enters a loop decrementing <sup>x</sup><sup>+</sup> or a loop decrementing <sup>x</sup>−. Then, it can reach the zero vector only if all variable checks had been consistent.

From PRBCA to Z-grammars. It remains to convert in logspace an (r, k)- PRBCA into an equivalent Z-grammar. Just as for converting an RBCA into a Z-VASS, one can convert a PRBCA into an equivalent Z-PVASS (pushdown vector addition system with Z-counters). Afterwards, one applies the classical transformation from pushdown automata to context-free grammars (a.k.a. triple construction), cf. [1, Lemma 2.26]: We introduce for every state pair (p, q) a nonterminal Xp,q, deriving all words which are read between p to q (starting and ending with empty stacks). For example, we introduce productions Xp,q → aX<sup>p</sup>-,q b for all push transitions (p, a, γ, p ) and pop transitions (q , b, γ,q ¯ ). The counter effects of transitions in the Z-PVASS (vectors in Z<sup>k</sup>) are translated into effects of the productions, e.g. the effect of the production Xp,q → aXp-,q b above is the sum of the effects of the corresponding push- and pop-transition.

## 4 Deciding unboundedness predicates

Proof overview. In this section, we prove Theorem 2. Let us begin with a sketch. Our task is to take a PRBCA A and non-deterministically compute a PDA A so that L(A) satisfies p if and only if some of the outcomes for A satisfy p. It will be clear from the construction that if the input was an RBCA, then the resulting PDA will be an NFA. Using Theorem 1 we will phrase the main part of the reduction in terms of Z-grammars, meaning we take a Z-grammar G as input and non-deterministically compute context-free grammars G .

The idea of the reduction is to identify a set of productions in G that, in some appropriate sense, can be canceled (regarding the integer counter values) by a collection of other productions. Then, G is obtained by only using a set of productions that can be canceled. Moreover, these productions are used regardless of what counter updates they perform. Then, to show the correctness, we argue in two directions: First, we show that any word derivable by G occurs as a factor of L(G). Essentially, this is because each production used in G can be canceled by adding more productions in G, thus yielding a complete derivation of G. Thus, we have that L(G ) ⊆ F(L(G)), which by the axioms of unboundedness predicates means that p(L(G )) implies p(L(G)). Second, we show that L(G) is a finite union of products (i.e. concatenations) P<sup>i</sup> = L<sup>1</sup> ·L<sup>2</sup> ···L<sup>k</sup> such that each L<sup>i</sup> is either finite or included in L(G ) for some G among all non-deterministic outcomes. Again, by the axioms of unboundedness predicates, this means that if p(L(G)), then p(L(G )) must hold for some G .

Unboundedness predicates and finite languages. Before we start with the proof, let us observe that we may assume that our unboundedness predicate is only satisfied for infinite sets. First, suppose p is satisfied for {ε}. This implies that p = p=<sup>∅</sup> and hence we can just decide whether p(L) by deciding whether L = ∅, which can be done in NP [32]. From now on, suppose that p is not satisfied for {ε}. Consider the alphabet Σ<sup>1</sup> := {a ∈ Σ | p({a})}. Now observe that if K ⊆ Σ<sup>∗</sup> is finite, then by the axioms of unboundedness predicates, we have p(K) if and only if some letter from Σ<sup>1</sup> appears in K. Thus, if L ⊆ (Σ\Σ1)∗, then p(L) can only hold if L is infinite. This motivates the following definition. Given a language L ⊆ Σ∗, we define

$$L\_0 = L \cap (\Sigma \backslash \Sigma\_1)^\*, \quad L\_1 = L \cap \Sigma^\* \Sigma\_1 \Sigma^\*.$$

Then, p(L) if and only if p(L0) or p(L1). Moreover, p(L1) is equivalent to L<sup>1</sup> = ∅.

Therefore, our reduction proceeds as follows. We construct (P)RBCA for L<sup>0</sup> and for L1. This can be done in logspace, because intersections with regular languages can be done with a simple product construction. Then, we check in NP whether L<sup>1</sup> = ∅. If yes, then we return "unbounded". If no, we regard p as an unboundedness predicate on languages over Σ \Σ<sup>1</sup> with the additional property that p is only satisfied for infinite languages. Thus, it suffices to prove Theorem 2 in the case that p is only satisfied for infinite sets.

Pumps and cancelation. In order to define our notion of cancelable productions, we need some terminology. We will need to argue about derivation trees for <sup>Z</sup>-grammars. For any alphabet <sup>Γ</sup> and <sup>d</sup> <sup>∈</sup> <sup>N</sup>, let <sup>T</sup>Γ,d be the set of all finite trees where every node is labeled by both (i) a letter from Γ and (ii) a vector from Z<sup>d</sup>. Suppose G = (N, Σ, P, S) is a d-dimensional Z-grammar. For a production p = (A → u, *v*), we write ϕ(p) := *v* for its associated counter effect. To each derivation in G, we associate a derivation tree from T<sup>N</sup>∪Σ,d as for context-free grammars. The only difference is that whenever we apply a production (A → u, *v*), then the node corresponding to the rewritten A is also labeled with *v*. As in context-free grammars, the leaf nodes carry terminal letters; their vector label is just **<sup>0</sup>** <sup>∈</sup> <sup>Z</sup><sup>d</sup>.

We extend the map <sup>ϕ</sup> to both vectors in <sup>N</sup><sup>P</sup> and to derivation trees. If *<sup>u</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> , then ϕ(*u*) = - <sup>p</sup>∈<sup>P</sup> <sup>ϕ</sup>(p)·*u*[p]. Similarly, if <sup>τ</sup> is a derivation tree, then <sup>ϕ</sup>(<sup>τ</sup> ) <sup>∈</sup> <sup>Z</sup><sup>d</sup> is the sum of all labels from <sup>Z</sup><sup>d</sup>. A derivation tree <sup>τ</sup> for a derivation <sup>A</sup> <sup>⇒</sup><sup>∗</sup> <sup>u</sup> is called complete if A = S, u ∈ Σ<sup>∗</sup> and ϕ(τ ) = **0**. In other words, τ derives a terminal word and the total counter effect of the derivation is zero. For such a complete derivation, we also write yield(τ ) for the word u. A derivation tree τ is called a pump if it is the derivation tree of a derivation of the form A ⇒<sup>∗</sup> uAv for some u, v ∈ Σ<sup>∗</sup> and A ∈ N. A subset M ⊆ N of the non-terminals is called realizable if there exists a complete derivation of G that contains all non-terminals in M and no non-terminals outside of M.

A production p in P is called M-cancelable if there exist pumps τ1,...,τ<sup>k</sup> (for some <sup>k</sup> <sup>∈</sup> <sup>N</sup>) such that (i) <sup>p</sup> occurs in some <sup>τ</sup><sup>i</sup> and (ii) <sup>ϕ</sup>(τ1)+···+ϕ(τk) = **<sup>0</sup>**, i.e. the total counter effect of τ1,...,τ<sup>k</sup> is zero and (iii) all productions in τ1,...,τ<sup>k</sup> only use non-terminals from M. We say that a subset Q ⊆ P is M-cancelable if all productions in Q are M-cancelable.

The reduction. Using the notions of M-cancelable productions, we are ready to describe how the context-free grammars are constructed. Suppose that M is realizable, that Q ⊆ P is M-cancelable, and that A ∈ M. Consider the language

$$L\_{A,Q} = \{ u, v \in \Sigma^\* \mid \exists \text{ derivation } A \not\Rightarrow\_Q uAv \}.$$

Thus LA,Q consists of all words u and v appearing in derivations (whose counter values are not necessarily zero) of the form A ⇒<sup>∗</sup> uAv, if we only use Mcancelable productions. The LA,Q will be the languages L(G ) mentioned above.

It is an easy observation that we can, given G and a subset Q ⊆ P, construct a context-free grammar for LA,Q:

Lemma 1. Given a <sup>Z</sup>-grammar <sup>G</sup>, a non-terminal <sup>A</sup>, and a subset <sup>Q</sup> <sup>⊆</sup> <sup>P</sup>, we can construct in logspace a context-free grammar for LA,Q. Moreover, if G is left-linear, then the construction yields an NFA for LA,Q.

We provide details in the full version [5]. Now, our reduction works as follows:


Here, we need to show that steps 1 and 2 can be done in NP:

Lemma 2. Given a subset M ⊆ N, we can check in NP whether M is realizable. Moreover, given M ⊆ N and p ∈ P, we can check in NP if p is M-cancelable.

Both can be done using the fact that for a given context-free grammar, one can construct a Parikh-equivalent existential Presburger formula [55] and the fact that satisfiability of existential Presburger formulas is in NP. See the full version [5] for details. This completes the description of our reduction. Therefore, it remains to show correctness of the reduction. In other words, to prove:

Proposition 1. We have p(L(G)) if and only if p(LA,Q) for some subset Q ⊆ P such that there is a realizable M ⊆ N with A ∈ M and Q being M-cancelable.

Proposition 1 will be shown in two lemmas:

Lemma 3. If M is realizable and Q is M-cancelable, then LA,Q ⊆ F(L(G)) for every A ∈ M.

Lemma 4. L(G) is included in a finite union of sets of the form K<sup>1</sup> ·K<sup>2</sup> ··· Km, where each K<sup>i</sup> is either finite or a set LA,Q, where Q is M-cancelable for some realizable M ⊆ N, and A ∈ M.

Let us see why Proposition 1 follows from Lemmas 3 and 4.

Proof (Proposition 1). We begin with the "if" direction. Thus, suppose p(LA,Q) for A and Q as described. Then by Lemma 3 and the first and fourth axioms of unboundedness predicates, this implies p(L(G)).

For the "only if" direction, suppose p(L(G)). By the first axiom of unboundedness predicates, p must hold for the finite union provided by Lemma 4. By the second axiom, this implies that p(K<sup>1</sup> ··· Km) for a finite product K<sup>1</sup> ··· K<sup>m</sup> as in Lemma 4. Moreover, by the third axiom, this implies that p(Ki) for some i ∈ {1,...,m}. If K<sup>i</sup> is finite, then by assumption, p(Ki) does not hold. Therefore, we must have p(Ki) for some K<sup>i</sup> = LA,Q, as required.

Flows. It remains to prove Lemmas 3 and 4. We begin with Lemma 3 and for this we need some more terminology. Let <sup>Σ</sup> be an alphabet. By <sup>Ψ</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>N</sup><sup>Σ</sup>, we denote the Parikh map, which is defined as Ψ(w)(a) = |w|<sup>a</sup> for w ∈ Σ<sup>∗</sup> and a ∈ Σ. In other words, Ψ(w)(a) is the number of occurrences of a in w ∈ Σ∗. If Γ ⊆ Σ is a subset, then π<sup>Γ</sup> : Σ<sup>∗</sup> → Γ<sup>∗</sup> is the homomorphism with π<sup>Γ</sup> (a) = ε for a ∈ Σ \ Γ and π<sup>Γ</sup> (a) = a for a ∈ Γ. We also call π<sup>Γ</sup> the projection to Γ.

Suppose we have a Z-grammar G = (N, Σ, P, S) with non-terminals N and productions P. For a derivation tree τ , we write Ψ(τ ) for the vector in N<sup>P</sup> that counts how many times each production appears in τ . We introduce a map ∂, which counts how many non-terminals each production consumes and produces. Formally, <sup>∂</sup> : <sup>N</sup><sup>P</sup> <sup>→</sup> <sup>Z</sup><sup>N</sup> is the monoid homomorphism that sends the production <sup>p</sup> <sup>=</sup> <sup>A</sup> <sup>→</sup> <sup>w</sup> to the vector <sup>∂</sup>(p) = <sup>−</sup><sup>A</sup> <sup>+</sup> <sup>Ψ</sup>(π<sup>N</sup> (w)). Here, <sup>−</sup><sup>A</sup> <sup>∈</sup> <sup>Z</sup><sup>N</sup> denotes the vector with <sup>−</sup><sup>1</sup> at the position of <sup>A</sup> and <sup>0</sup> everywhere else. A vector *<sup>u</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> is a flow if ∂(*u*) = **0**. Observe that a derivation tree τ is a pump if and only if Ψ(τ ) is a flow. In this case, we also call the vector *<sup>u</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> with *<sup>u</sup>* <sup>=</sup> <sup>Ψ</sup>(<sup>τ</sup> ) <sup>a</sup> pump.

The following lemma will provide an easy way to construct derivations. It is a well-known result by Esparza [19, Theorem 3.1], and has since been exploited in several results on context-free grammars. Our formulation is slightly weaker than Esparza's. However, it is enough for our purposes and admits a simple proof, which is inspired by a proof of Kufleitner [44].

## Lemma 5. Let *<sup>f</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> . Then *<sup>f</sup>* is a flow if and only if it is a sum of pumps.

Proof. The "if" direction is trivial, because every pump is clearly a flow. Conversely, suppose *<sup>f</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> is a flow. We can clearly write *<sup>f</sup>* <sup>=</sup> <sup>Ψ</sup>(τ1) + ··· <sup>+</sup> <sup>Ψ</sup>(τn), where τ1,...,τ<sup>n</sup> are derivation trees: We can just view each production in *f* as its own derivation tree. Now suppose that we have *f* = Ψ(τ1) + ··· + Ψ(τn) so that n is minimal. We claim that then, each τ<sup>i</sup> is a pump, proving the lemma.

Suppose not, then without loss of generality, τ<sup>1</sup> is not a pump. Since τ<sup>1</sup> is a derivation, this means Ψ(τ1) cannot be a flow and thus there must be a nonterminal A with ∂(τ1)(A) = 0.

Let us first assume that ∂(τ1)(A) > 0. This means there is a non-terminal A occurring at a leaf of τ<sup>1</sup> such that A is not the start symbol of τ1. Since *f* = Ψ(τ1)+···+Ψ(τn) is a flow, we must have ∂(Ψ(τ2)+···+Ψ(τn))(A) < 0. This, in turn, is only possible if some τ<sup>j</sup> has A as its start symbol. We can therefore merge τ<sup>1</sup> and τ<sup>j</sup> by replacing τ1's A-labelled leaf by the new subtree τ<sup>j</sup> . We obtain a new collection of n − 1 trees whose Parikh image is *f*, in contradiction to the choice of n. If ∂(τ1)(A) < 0, then there must be a τ<sup>j</sup> with ∂(τ<sup>j</sup> )(A) > 0 and thus we can insert τ<sup>1</sup> below τ<sup>j</sup> , reaching a similar contradiction.

### Constructing derivations. Using flows, we can now prove Lemma 3.

Proof. Suppose there is a derivation τ : A ⇒<sup>∗</sup> <sup>Q</sup> uAv with A ∈ M and u, v ∈ Σ∗. We have to show that both u and v occur in some word w ∈ L(G). Furthermore, if G is in Chomsky normal form, we can choose w such that |w| is linear in |u| and |v|. Our goal is to construct a derivation of G in which we find u and v as factors. We could obtain a derivation tree by inserting τ into some derivation tree for G (at some occurrence of A), but this might yield non-zero counter values. Therefore, we will use the fact that Q is M-cancelable to find other pumps that can be inserted as well in order to bring the counter back to zero.

Since M ⊆ N is realizable, there exists a complete derivation τ<sup>0</sup> that derives some word w<sup>0</sup> ∈ L(G) and uses precisely the non-terminals in M. Since Q ⊆ P is M-cancelable, we know that for each production p ∈ Q, there exist pumps τ1,...,τ<sup>k</sup> such that (i) p occurs in some τi, (ii) ϕ(τ1) + ··· + ϕ(τk) = **0** and (iii) all productions in τ1,...,τ<sup>k</sup> only use non-terminals in M. This allows us to define *f*<sup>p</sup> := Ψ(τ1) + ··· + Ψ(τk). Observe that *f*<sup>p</sup> contains only productions with non-terminals from M, we have *f*p[p] > 0, and ϕ(*f*p) = **0**. We can use the flows *f*<sup>p</sup> to find the desired canceling pumps. Since by Lemma 5, every flow can be decomposed into a sum of pumps, it suffices to construct a particular flow. Specifically, we look for a flow *<sup>f</sup>*<sup>τ</sup> <sup>∈</sup> <sup>N</sup><sup>P</sup> such that:

1. any production p with *f*<sup>τ</sup> [p] > 0 uses only non-terminals from M, and 2. ϕ(*f*<sup>τ</sup> + Ψ(τ )) = **0**.

The first condition ensures that all the resulting pumps can be inserted into τ0. The second condition ensures that the resulting total counter values will be zero. We claim that with

$$\mathbf{f}\_{\tau} = \left(\sum\_{p \in Q} \Psi(\tau)[p] \cdot \mathbf{f}\_p\right) - \Psi(\tau), \tag{1}$$

we achieve these conditions. First, observe that *<sup>f</sup>*<sup>τ</sup> <sup>∈</sup> <sup>N</sup><sup>P</sup> : We have

$$\begin{array}{rcl} \mathbf{f}\_{\tau}[q] & \geq & \Psi(\tau)[q] \cdot \mathbf{f}\_{q}[q] - \Psi(\tau)[q] & = & \Psi(\tau)[q] \cdot (\mathbf{f}\_{q}[q] - 1) \end{array}$$

which is at least zero as *f*q[q] must be non-zero by definition. Second, note that *f*<sup>τ</sup> is indeed a flow, because it is a Z-linear combination of flows. Moreover, all productions appearing in *f*<sup>τ</sup> also appear in *f*<sup>p</sup> for some p ∈ Q or in τ , meaning that all non-terminals must belong to M. Finally, the total counter effect of *f*<sup>τ</sup> + Ψ(τ ) is zero as *f*<sup>τ</sup> + Ψ(τ ) = - <sup>p</sup>∈<sup>Q</sup> <sup>Ψ</sup>(<sup>τ</sup> )[p] · *<sup>f</sup>*<sup>p</sup> is a sum of flows each with total counter effect zero.

Now, since *f*<sup>τ</sup> is a flow, Lemma 5 tells us that there are pumps τ <sup>1</sup>,...,τ m such that *f*<sup>τ</sup> = Ψ(τ <sup>1</sup>) + ··· + Ψ(τ <sup>m</sup>). Therefore, inserting τ and τ <sup>1</sup>,...,τ <sup>m</sup> into τ<sup>0</sup> must yield a derivation of a word that has both u and v as factors and also has counter value

$$\underbrace{\varphi(\tau\_0)}\_{=\mathbf{0}} + \underbrace{\varphi(\tau) + \varphi(\tau\_1') + \dots \cdot \varphi(\tau\_m')}\_{=\varphi(\tau) + \varphi(f\_\tau) = \mathbf{0}} = \mathbf{0}.$$

Thus, we have a complete derivation of G. Hence LA,Q ⊆ F(L(G)).

Decomposition into finite union. It remains to prove Lemma 4. For the decomposition, we show that there exists a finite set D<sup>0</sup> of complete derivations such that all complete derivations of G can be obtained from some derivation in D<sup>0</sup> and then inserting pumps that produce words in LA,Q, for some appropriate A and Q. Here, it is key that the set D<sup>0</sup> of "base derivations" is finite. Showing this for context-free grammars would just require a simple "unpumping" argument based on the pigeonhole principle as in Parikh's theorem [51]. However, in the case of Z-grammars, where D<sup>0</sup> should only contain derivations that have counter value zero, this is not obvious. To achieve this, we employ a well-quasi ordering on (labeled) trees. Recall that a quasi ordering is a reflexive and transitive ordering. For a quasi ordering (X, ≤) and a subset Y ⊆ X, we write Y ↑ for the set {x ∈ X | ∃y ∈ Y : y ≤ x}. We say that (X, ≤) is a well-quasi ordering (WQO) if every non-empty subset Y ⊆ X has a finite subset Y<sup>0</sup> ⊆ Y such that Y ⊆ Y<sup>0</sup> ↑.

We define an ordering on all trees in TN∪Σ,d. A tree s is a subtree of t if there exists a node x in t such that s consists of all nodes of t that are descendants of x. If τ1,...,τ<sup>n</sup> are trees, then we denote by r[τ1,...,τn] the tree with a root node r and the subtrees τ1,...,τ<sup>n</sup> directly under the root. Now let τ = (A,*u*)[τ1,...,τn] and τ = (B, *v*)[σ1,...,σm] be trees in TN∪Σ,d. We define the ordering as follows. If n = 0 (i.e. τ consists of only one node), then we have τ τ if and only if A = B and m = 0. If n ≥ 1, then we define inductively:

$$\begin{aligned} \tau \preceq \tau' \quad \iff \quad A = B \quad \text{and} \; \exists \; \text{subtree } \tau'' = (A, \mathfrak{u}') [\tau'\_1, \dots, \tau'\_n] \text{ of } \tau' \\ \text{with } \tau\_i \preceq \tau'\_i \text{ for } i = 1, \dots, n \end{aligned}$$

Based on , we define as slight refinement: We write τ τ if and only if τ τ and the set of non-terminals appearing in τ is the same as in τ .

Lemma 6. (T<sup>N</sup>∪Σ,d, ) is a WQO.

Proof. In [47, Lemma 3.3], it was shown that is a WQO. Then is the product of equality on a finite set, which is a WQO, and the WQO .

Lemma 6 allows us to decompose L(G) into a finite union: For each complete derivation τ of G, we define

L<sup>τ</sup> (G) = {w ∈ Σ<sup>∗</sup> | ∃ complete derivation τ with τ τ and yield(τ ) = w}.

Lemma 7. There exists a finite set D<sup>0</sup> ⊆ T<sup>N</sup>∪Σ,d of complete derivations of G such that L(G) =  <sup>τ</sup>∈D<sup>0</sup> <sup>L</sup><sup>τ</sup> (G).

Proof. Since (T<sup>N</sup>∪Σ,d, ) is a WQO, the set D ⊆ T<sup>N</sup>∪T ,d of all complete derivations of G has a finite subset D<sup>0</sup> with D ⊆ D<sup>0</sup> ↑. This implies the lemma.

Decomposition into finite product. In light of Lemma 7, it remains to be shown that for each tree τ , we can find a product K1·K<sup>2</sup> ··· K<sup>m</sup> of languages such that L<sup>τ</sup> (G) ⊆ K<sup>1</sup> · K<sup>2</sup> ··· K<sup>m</sup> and each K<sup>i</sup> is either finite or is of the form LA,Q. We construct the overapproximation of L<sup>τ</sup> (G) inductively as follows. Let M ⊆ N and Q ⊆ P be subsets of the non-terminals and the productions, respectively. If τ has one node, labeled by a ∈ Σ, then we set AppQ(τ ) := {a}. Moreover, if τ = (A,*u*)[τ1,...,τn] for A ∈ N and trees τ1,...,τn, then we set

$$\text{App}\_Q(\tau) := L\_{A,Q} \cdot \text{App}\_Q(\tau\_1) \cdot \text{App}\_Q(\tau\_2) \cdot \cdots \cdot \text{App}\_Q(\tau\_n) \cdot L\_{A,Q} \cdot$$

Finally, we set App(τ ) := AppQ(τ ), where Q ⊆ P is the set of all M-cancelable productions, where M is the set of all non-terminals appearing in τ . Now clearly, each App(τ ) is a finite product K<sup>1</sup> · K<sup>2</sup> ··· K<sup>m</sup> as desired: This follows by induction on the size of τ . Thus, to prove Lemma 4, the following suffices:

## Lemma 8. For every complete derivation tree τ of G, we have L<sup>τ</sup> (G) ⊆ App(τ ).

Proof. Suppose w ∈ L<sup>τ</sup> (G) is derived using a complete derivation tree τ with τ τ . Then, the set of non-terminals appearing in τ must be the same as in τ ; we denote it by M. Let Q ⊆ P be the set of all M-cancelable productions. Moreover, since τ τ , we can observe that there exist pumps τ1,...,τ<sup>n</sup> with root non-terminals A1,...,A<sup>n</sup> and nodes x1,...,x<sup>n</sup> in τ such that τ can be obtained from τ by replacing each node x<sup>i</sup> by the pump τi.

Since both τ and τ are complete derivations of G, each must have counter effect **0**. Thus, ϕ(τ1)+···+ϕ(τn) = ϕ(τ )−ϕ(τ ) = **0**. Hence, the pumps τ1,...,τ<sup>n</sup> witness that the productions appearing in τ1,...,τ<sup>n</sup> are M-cancelable. Thus, the derivation corresponding to τ<sup>i</sup> uses only productions in Q and thus τ<sup>i</sup> corresponds to A<sup>i</sup> ⇒<sup>∗</sup> <sup>Q</sup> uiAv<sup>i</sup> for some ui, v<sup>i</sup> and we have ui, v<sup>i</sup> ∈ LA,Q.

## 5 Growth

In this section, we prove Theorem 3. Since clearly, a bounded language has polynomial growth, it remains to be shown that if L is accepted by a PRBCA and L is not bounded, then it has exponential growth. For two languages L1, L<sup>2</sup> ⊆ <sup>Σ</sup>∗, we write <sup>L</sup><sup>1</sup> →lin <sup>L</sup><sup>2</sup> if there exists a constant <sup>c</sup> <sup>∈</sup> <sup>N</sup> such that for every word w<sup>1</sup> ∈ L1, there exists w<sup>2</sup> ∈ L<sup>2</sup> with |w2| ≤ c · |w1| and w<sup>1</sup> is a factor of w2. It is not difficult to observe that for two languages L1, L<sup>2</sup> ⊆ Σ∗, if L<sup>1</sup> →lin L<sup>2</sup> and L<sup>1</sup> has exponential growth, then so does L2.

In order to show Theorem 3, we need an adapted version of Lemma 3. A <sup>Z</sup>-grammar is in Chomsky normal form if all productions are of the form (<sup>A</sup> <sup>→</sup> BC, *<sup>v</sup>*) or (<sup>A</sup> <sup>→</sup> a, *<sup>v</sup>*) with A, B, C <sup>∈</sup> <sup>N</sup>, <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, and *<sup>u</sup>*, *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup><sup>k</sup>. In other words, the context-free grammar obtained by forgetting all counter vectors is in Chomsky normal form. Fernau and Stiebe [21, Proposition 5.12] have shown that every Z-grammar has an equivalent Z-grammar in Chomsky normal form.

Lemma 9. If <sup>G</sup> = (N, Σ, P, S) is a <sup>Z</sup>-grammar in Chomsky normal form, <sup>M</sup> <sup>⊆</sup> N is realizable, Q ⊆ P is M-cancelable, and A ∈ M, then LA,Q →lin L(G).

This is shown essentially the same way as Lemma 3. Let us now show that if a language L accepted by a PRBCA is not bounded, then it must have exponential growth. We have seen above that as a PRBCA language, L is generated by some Z-grammar. As shown by Fernau and Stiebe [21, Proposition 5.12], this implies that L = L(G) for some Z-grammar G in Chomsky normal form. Since L is not bounded, Lemma 4 yields A and Q such that LA,Q is not a bounded language. It is well-known that any context-free language that is not bounded has exponential growth (this fact has apparently been independently discovered at least six times, see [24] for references). Thus, LA,Q has exponential growth. By Lemma 9, we have LA,Q →lin L and thus L has exponential growth.

Acknowledgments We are grateful to Manfred Kufleitner for sharing the manuscript [44] before it was publicly available. It provides an alternative proof for constructing an existential Presburger formula for the Parikh image of a context-free grammar. The latter was also shown in [55], based on [19]. We use it in Lemma 5, which could also be derived from [19, Theorem 3.1]. However, we provide a simple direct proof of Lemma 5 inspired by Kufleitner's proof.

This work is funded by the European Union (ERC, FINABIS, 101077902). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

## References


puter Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021. IEEE, 2021, pp. 1–13. doi: 10.1109/LICS52264.2021.9470527.


in Computer Science. Springer, 2015, pp. 228–239. doi: 10.1007/978- 3- 319-21500-6\_18.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Reverse Bisimilarity vs. Forward Bisimilarity**

Marco Bernardo1() and Sabina Rossi<sup>2</sup>

<sup>1</sup> Universit`a di Urbino, Urbino, Italy marco.bernardo@uniurb.it <sup>2</sup> Universit`a Ca' Foscari di Venezia, Venice, Italy

**Abstract.** Reversibility is the capability of a system of undoing its own actions starting from the last performed one, in such a way that a past consistent state is reached. This is not trivial for concurrent systems, as the last performed action may not be uniquely identifiable. There are several approaches to address causality-consistent reversibility, some including a notion of forward-reverse bisimilarity. We introduce a minimal process calculus for reversible systems to investigate compositionality properties and equational characterizations of forward-reverse bisimilarity as well as of its two components, i.e., forward bisimilarity and reverse bisimilarity, so as to highlight their differences. The study is conducted not only in a nondeterministic setting, but also in a stochastic one where time reversibility and lumpability for Markov chains are exploited.

## **1 Introduction**

Reversibility started to receive attention in computing several decades ago [15,3]. Landauer's principle states that any irreversible manipulation of information, such as bit erasure or computation path merging, must be accompanied by a corresponding entropy increase. Therefore, any reversible computation, in which no information is lost, may be potentially carried out without releasing any heat. Nowadays, *reversible computing* has many applications ranging from biochemical reaction modeling and parallel discrete-event simulation to robotics, control theory, fault tolerant systems, and concurrent program debugging.

In a reversible system, we can observe two directions of computation: a *forward* one, coinciding with the normal way of computing, and a *backward* one, along which the effects of the forward one are undone when needed in a *causally consistent* way, i.e., by returning to a past consistent state. The latter task is not easy to accomplish in a concurrent system, because the undo procedure necessarily starts from the last performed action and this may not be unique. The usually adopted strategy is that an action can be undone provided that all of its consequences, if any, have been undone beforehand.

In the process algebra literature, two approaches have been developed to reverse a computation based on keeping track of past actions: the dynamic one of [7] and the static one of [24]. The former yields RCCS, a variant of CCS [20] that uses stack-based memories attached to processes to record all the actions executed by those processes. In contrast, the latter proposes a general method, of which CCSK is a result, to reverse calculi, relying on the idea of retaining within the process syntax all executed actions and dynamic operators.

#### 266 M. Bernardo and S. Rossi

In [24] *forward-reverse bisimilarity* is introduced too. Unlike standard bisimilarity [22,20], it is truly concurrent as it does not satisfy the expansion law of parallel composition into a choice among all possible action sequencings. The interleaving view can be restored by employing *back-and-forth bisimilarity* [8]. This is defined on computation paths instead of states, thus preserving not only causality but also history as backward moves have to occur along the path followed when going forward even in the presence of concurrency.

In this paper, we investigate compositionality properties and equational characterizations of forward-reverse bisimilarity as well as of its two components, i.e., forward bisimilarity and reverse bisimilarity, so as to highlight their differences. To this purpose, we introduce a minimal calculus including only the terminated process 0, the unary action prefix operator a . where a stands for an action, and the binary alternative composition operator + also called choice. These operators are enough to compare the essential features of the three equivalences, in a neutral way with respect to interleaving view vs. true concurrency.

The paper is divided into two parts. In Section 2, we conduct our study on *nondeterministic* reversible processes, with the operational semantic rules defined in the style of [24] generating only forward transitions that are viewed as bidirectional, in lieu of a forward transition relation separated from a backward transition relation. In Section 3, we repeat our study on *stochastic* reversible processes, whose operational semantic rules in the style of [24] generate a single transition relation encompassing both forward transitions and backward transitions, by exploiting time reversibility [13] and lumpability [14] for Markov chains. In Section 4, we recap the differences between forward and reverse bisimilarities.

## **2 The Nondeterministic Case**

In this section, we investigate forward bisimilarity, reverse bisimilarity, and forward-reverse bisimilarity over nondeterministic reversible processes. We start by introducing the syntax (Section 2.1) and the semantics (Section 2.2) for these processes through a minimal calculus, then we provide the definitions of the three equivalences (Section 2.3) and we study their congruence properties (Section 2.4) and equational characterizations (Section 2.5).

#### **2.1 Syntax of Nondeterministic Reversible Processes**

In the formalization of a process, we usually describe only its future behavior, hence the following syntax for sequential processes where a ∈ A:

$$P ::= \underbrace{0} \mid a . P \mid P + P$$

However, in order to support the definition of the semantics in the style of [24], we need to enrich the syntax above with information about the past, i.e., the actions that have already been executed. Due to the absence of a parallel composition operator, unlike [24] there is no need to add communication keys to executed actions. It thus suffices to mark them with some symbol, which we choose to be †. This yields the following syntax extended with information about the past: P ::= 0 | a.P | a†. P | P + P

We can syntactically characterize several classes of processes generated by the grammar above through suitable predicates. Firstly, we have *initial* processes, i.e., processes in which all the actions are unexecuted:

$$\begin{array}{c} initial(\underline{0})\\ initial(a \,\,P) \Longleftarrow \mathit{initial}(P) \\ initial(P\_1 + P\_2) \Longleftarrow \mathit{initial}(P\_1) \land initial(P\_2) \\ \text{have } final \text{ processes, i.e., processes in which all} \end{array}$$

Secondly, we have *final* processes, i.e., processes in which all the actions along a single path have been executed:

$$\begin{array}{c} \mathit{final(\underline{0})}\\ \mathit{final(a^\dagger.P}) \Longleftarrow \mathit{final(P)}\\ \mathit{final(P\_1+P\_2)} \Longleftarrow \mathit{(final(P\_1) \land initial(P\_2))} \lor \\ \mathit{(initial(P\_1) \land final(P\_2))} \end{array}$$

Multiple paths arise only in the presence of alternative compositions. At each occurrence of +, only the subprocess chosen for execution advances, while the other one, although not selected, is kept as an initial subprocess within the overall process to support the definition of the semantics in the style of [24].

Thirdly, we have the processes that are *reachable* from an initial one, whose set we denote by P:

$$\begin{array}{c} \text{reachable}(\underline{0})\\ \text{reachable}(a \, P) \Longleftarrow \mathit{initial}(P) \\ \text{reachable}(a^\dagger \, P) \Longleftarrow \mathit{reachable}(P) \\ \text{reachable}(P\_1 + P\_2) \Longleftarrow \mathit{(reachable}(P\_1) \land \mathit{initial}(P\_2)) \lor \\ \text{(initial}(P\_1) \land \mathit{reachable}(P\_2)) \end{array}$$


#### **2.2 Semantics of Nondeterministic Reversible Processes**

According to the approach of [24], dynamic operators such as action prefix and alternative composition have to be made static by the semantics, so as to retain within the syntax all the information needed to enable reversibility. For the sake of minimality, unlike [24] we do not generate two distinct transition relations – a forward one −→ and a backward one −- – but a single transition relation, which we implicitly regard as being symmetric like in [8] to enforce the *loop property*: any executed action can be undone and any undone action can be redone.

In our setting, a backward transition from P to P (P a −- P) is subsumed by the corresponding forward transition t from P to P- (P <sup>a</sup>−→ P- ). As will become clear with the definition of behavioral equivalences in Section 2.3, like in [8] when going forward we view t as an *outgoing* transition of P, while when

$$\begin{cases} \text{Acr}\_{\text{f}} \xrightarrow[a]{initial(P)} \begin{aligned} \text{Acr}\_{\text{f}} \xrightarrow[a]{a} \text{Acr}\_{\text{p}} \xrightarrow[a]{P} \text{Acr}\_{\text{p}} \xrightarrow[a]{P} \text{Acr}\_{\text{p}} \end{aligned} & \begin{aligned} \text{Acr}\_{\text{p}} \xrightarrow[a]{P \xrightarrow{b} \text{P}'} \text{Acr}'\_{\text{p}} \xrightarrow[a]{P} \text{P}' \\ \text{Acr}\_{\text{i}} \xrightarrow[a]{P\_{1} \xrightarrow{a} \text{P}'\_{1} \xrightarrow[a]{b} \text{P}'\_{1} + P\_{2}} \end{aligned} & \begin{aligned} \text{Acr}\_{\text{p}} \xrightarrow[a]{b \xrightarrow{b}} \text{Acr}'\_{\text{i}} \xrightarrow[a]{b \xrightarrow{b}} \text{Acr}'\_{\text{i}} \end{aligned} \end{cases}$$

**Table 1.** Operational semantic rules for nondeterministic reversible processes

going backward we view t as an incoming transition of P- . The semantic rules in Table <sup>1</sup> generate the labeled transition system (P, A, −→) where −→ ⊆ <sup>P</sup>×A×P.

The first rule for action prefix (Act<sup>f</sup> where f stands for forward) applies only if P is initial and retains the executed action in the target process of the generated forward transition by decorating the action itself with †. The second rule for action prefix (Act<sup>p</sup> where p stands for propagation) propagates actions executed by inner initial subprocesses.

In both rules for alternative composition (Cho<sup>l</sup> and Cho<sup>r</sup> where l stands for left and r stands for right), the subprocess that has not been selected for execution is retained as an initial subprocess in the target process of the generated transition. When both subprocesses are initial, both rules for alternative composition are applicable, otherwise only one of them can be applied and in that case it is the non-initial subprocess that can move, because the other one has been discarded at the moment of the selection.

Any state corresponding to a process different from 0 has at least one outgoing transition and exactly one incoming transition due to the decoration of executed actions. The labeled transition system underlying an initial process turns out to be a tree, whose branching points correspond to occurrences of +.

Example 1. The labeled transition systems generated by the rules in Table 1 for the two initial processes a . 0 + a . 0 and a . 0 are depicted below:

As far as the one on the left is concerned, we observe that, in the case of a standard process calculus, a single a-transition from a . 0 + a . 0 to 0 would have been generated due to the absence of action decorations within processes.

#### **2.3 Bisimilarities for Nondeterministic Reversible Processes**

The asymmetry between the relative positions of already executed actions and actions to be executed within reachable processes, as well as the asymmetry between the use of predicates initial and final in the operational semantic rules, determine a number of asymmetries between forward and reverse bisimilarity defined below that will become evident in Sections 2.4 and 2.5.

The difference between the definitions of forward bisimilarity and reverse bisimilarity is that the former considers only outgoing transitions [22,20] whereas the latter considers only incoming transitions. We also address forward-reverse bisimilarity [24], which considers both outgoing transitions and incoming ones. All the equivalences are strong, i.e., they do not abstract from invisible actions.

**Definition 1.** We say that <sup>P</sup><sup>1</sup>, P<sup>2</sup> <sup>∈</sup> <sup>P</sup> are forward bisimilar, written <sup>P</sup><sup>1</sup> <sup>∼</sup>FB <sup>P</sup><sup>2</sup>, iff (P<sup>1</sup>, P<sup>2</sup>) ∈ B for some forward bisimulation <sup>B</sup>. A symmetric relation <sup>B</sup> over <sup>P</sup> is a forward bisimulation iff for all (P<sup>1</sup>, P<sup>2</sup>) ∈ B and <sup>a</sup> <sup>∈</sup> <sup>A</sup>:

**–** Whenever <sup>P</sup><sup>1</sup> <sup>a</sup> −→ P- <sup>1</sup>, then <sup>P</sup><sup>2</sup> <sup>a</sup> −→ P- <sup>2</sup> with (P- <sup>1</sup>, P- <sup>2</sup>) ∈ B.

**Definition 2.** We say that <sup>P</sup><sup>1</sup>, P<sup>2</sup> <sup>∈</sup> <sup>P</sup> are reverse bisimilar, written <sup>P</sup><sup>1</sup> <sup>∼</sup>RB <sup>P</sup><sup>2</sup>, iff (P<sup>1</sup>, P<sup>2</sup>) ∈ B for some reverse bisimulation <sup>B</sup>. A symmetric relation <sup>B</sup> over <sup>P</sup> is a reverse bisimulation iff for all (P<sup>1</sup>, P<sup>2</sup>) ∈ B and <sup>a</sup> <sup>∈</sup> <sup>A</sup>:

**–** Whenever P- 1 <sup>a</sup>−→ P<sup>1</sup>, then P- 2 <sup>a</sup> −→ <sup>P</sup><sup>2</sup> with (P- <sup>1</sup>, P- <sup>2</sup>) ∈ B.

**Definition 3.** We say that <sup>P</sup><sup>1</sup>, P<sup>2</sup> <sup>∈</sup> <sup>P</sup> are forward-reverse bisimilar, written <sup>P</sup><sup>1</sup> <sup>∼</sup>FRB <sup>P</sup><sup>2</sup>, iff (P<sup>1</sup>, P<sup>2</sup>) ∈ B for some forward-reverse bisimulation <sup>B</sup>. A symmetric relation <sup>B</sup> over <sup>P</sup> is a forward-reverse bisimulation iff for all (P<sup>1</sup>, P<sup>2</sup>) ∈ B and a <sup>∈</sup> A:


It holds that <sup>∼</sup>FRB - ∼FB ∩ ∼RB. The inclusion is strict because for example the two final processes <sup>a</sup>†. <sup>0</sup> and <sup>a</sup>†. <sup>0</sup> <sup>+</sup> c . <sup>0</sup> are identified by <sup>∼</sup>FB and by <sup>∼</sup>RB, but distinguished by <sup>∼</sup>FRB as in the latter process action <sup>c</sup> is enabled again after undoing <sup>a</sup>. Moreover, <sup>∼</sup>FB and <sup>∼</sup>RB are incomparable because for instance:

$$\begin{array}{c} a^{\dagger}.\underline{0} \sim\_{\text{FB}} \underline{0} \text{ but } a^{\dagger}.\underline{0} \not\sim\_{\text{BB}} \underline{0}, \\\ a.\ 0 \sim\_{\text{BB}} \text{0 but } a.\ 0 \not\sim\_{\text{FB}} \underline{0} \end{array}$$

a . <sup>0</sup> <sup>∼</sup>RB <sup>0</sup> but a . <sup>0</sup> ∼FB <sup>0</sup> The first asymmetry is that ∼FRB = ∼FB over initial processes, with ∼RB strictly coarser, whilst ∼FRB = ∼RB over final processes because, after going backward, previously discarded subprocesses come into play again in the forward direction.

Example 2. The two processes shown in Example 1 are identified by all the three equivalences. This is witnessed by any bisimulation that contains the pairs (a . <sup>0</sup> <sup>+</sup> a . <sup>0</sup>,a. 0), (a†. <sup>0</sup> <sup>+</sup> a . <sup>0</sup>, a†. 0), and (a . <sup>0</sup> <sup>+</sup> a†. <sup>0</sup>, a†. 0).

#### **2.4 Congruence Properties**

In principle, it makes sense that ∼FB identifies processes with a different past and that ∼RB identifies processes with a different future, in particular with 0 that has neither past nor future. However, for ∼FB this results in a compositionality violation with respect to alternative composition. As an example:

$$\begin{array}{c} a^{\dagger}.b.\underline{0} \sim\_{\text{FB}} b.\underline{0} \\ a^{\dagger}.b.\underline{0} + c.\underline{0} \not\sim\_{\text{FB}} b.\underline{0} + c.\underline{0} \end{array}$$

because in a†.b. 0 + c . 0 action c is disabled due to the presence of the already executed action a†, while in b . 0 + c . 0 action c is enabled as there are no past actions preventing it from occurring. Note that a similar phenomenon does not happen with ∼RB as a†.b. 0 ∼RB b . 0 due to the incoming a-transition of a†.b. 0, thus yielding the *second asymmetry* between forward and reverse bisimilarity.

This problem, which does not show up for ∼RB and ∼FRB because these two equivalences cannot identify an initial process with a non-initial one, leads to the following variant of ∼FB that is sensitive to the presence of the past.

**Definition 4.** *We say that* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup> *are* past-sensitive forward bisimilar*, written* P<sup>1</sup> ∼FB,ps P2*, iff* (P1, P2) ∈ B *for some past-sensitive forward bisimulation* <sup>B</sup>*. A symmetric relation* <sup>B</sup> *over* <sup>P</sup> *is a* past-sensitive forward bisimulation *iff for all* (P1, P2) ∈ B*:*

**–** *initial*(P1) ⇐⇒ *initial*(P2)*.*

**–** *For all* a ∈ A*, whenever* P<sup>1</sup> <sup>a</sup> −→ <sup>P</sup>- <sup>1</sup>*, then* P<sup>2</sup> <sup>a</sup> −→ <sup>P</sup>- <sup>2</sup> *with* (P- 1, P- <sup>2</sup>) ∈ B*.*

Now ∼FB,ps is sensitive to the presence of the past:

$$a\_{\stackrel{\cdot}{\cdot}}^{\dagger}.b.\,\underline{0}\,\,\,\mathcal{T}\_{\stackrel{\cdot}{\cdot}\text{FB},\stackrel{\cdot}{\cdot}\text{ps}}\,\,\underline{b}\,\,\,\underline{0}$$

but can still identify non-initial processes having a different past:

$$a\_1^\top . P \sim\_{\text{FB}, \text{ps}} a\_2^\top . P$$

It holds that <sup>∼</sup>FRB - ∼FB,ps ∩ ∼RB, with ∼FRB = ∼FB,ps over initial processes as well as ∼FB,ps and ∼RB being incomparable because e.g. for a<sup>1</sup> = a2:

a† <sup>1</sup> . P ∼FB,ps a† <sup>2</sup> . P but a† <sup>1</sup> . P ∼RB a† <sup>2</sup> . P a<sup>1</sup> . P ∼RB a<sup>2</sup> . P but a<sup>1</sup> . P ∼FB,ps a<sup>2</sup> . P

We conclude by formalizing the congruence properties of all the considered equivalences. When present in the results below, side conditions just ensure that the overall processes are reachable.

**Theorem 1.** *Let* ∼ ∈ {∼FB, ∼FB,ps, ∼RB, ∼FRB}*,* ∼- ∈ {∼FB,ps, ∼RB, ∼FRB}*, and* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>*:*

	- a.P<sup>1</sup> ∼ a.P<sup>2</sup> *provided that initial*(P1) ∧ *initial*(P2)*.*
	- a†. P<sup>1</sup> ∼ a†. P2*.*
	- P<sup>1</sup> + P ∼- P<sup>2</sup> + P *and* P + P<sup>1</sup> ∼- P + P<sup>2</sup> *provided that initial*(P) ∨ (*initial*(P1) ∧ *initial*(P2))*.*

**–** ∼FB,ps *is the coarsest congruence with respect to* + *contained in* ∼FB*.*

## **2.5 Equational Characterizations**

We now investigate the equational characterizations of ∼FB,ps, ∼RB, and ∼FRB so as to highlight the fundamental laws of these behavioral equivalences. In the following, by deduction system we mean a set comprising the following axioms and inference rules on <sup>P</sup> – possibly enriched by a set of additional axioms <sup>A</sup> – corresponding to the fact that ∼FB,ps, ∼RB, and ∼FRB are equivalence relations as well as congruences with respect to action prefix and alternative composition:


**Table 2.** Axioms characterizing bisimilarity over nondeterministic reversible processes

$$\begin{aligned} \text{1- } \text{Reflexivity, symmetry, transition:} & \text{ } P = P\_1, \frac{P\_1 = P\_2}{P\_2 = P\_1}, \frac{P\_1 = P\_2 \quad P\_2 = P\_3}{P\_1 = P\_3}.\\ \text{2- } \text{Substitativity:} & \frac{P\_1 = P\_2 \quad initial(P\_1) \land initial(P\_2)}{a \, .P\_1 = a \, .P\_2}, \frac{P\_1 = P\_2}{a^\dagger \, .P\_1 = a^\dagger \, .P\_2}.\\ \text{3- } \text{-} \text{-} \text{-} \text{Substitativity:} & \frac{P\_1 = P\_2 \quad initial(P) \lor (initial(P\_1) \land initial(P\_2))}{P\_1 + P = P\_2 + P \quad P + P\_1 = P + P\_2}. \end{aligned}$$

It is well known that, in the case of bisimilarity over standard nondeterministic processes, alternative composition turns out to be associative and commutative and to admit 0 as neutral element [11]. The same holds true for ∼FB,ps, ∼RB, and ∼FRB because the two operational semantic rules for alternative composition are symmetric and 0 has no outgoing or incoming transitions. This is formalized by axioms A<sup>1</sup> to A<sup>3</sup> in Table 2.

Then, we have axioms specific to ∼FB,ps. Axioms A<sup>4</sup> and A<sup>5</sup> together establish that the past can be neglected when moving only forward, but the presence of the past cannot be ignored. Axiom A<sup>6</sup> states that a previously non-selected alternative can be discarded after starting moving only forward.

Likewise, we have axioms specific to ∼RB. Axiom A<sup>7</sup> means that the future can be completely canceled when moving only backward. Axiom A<sup>8</sup> states that a previously non-selected alternative can be discarded when moving only backward. Since there are no constraints on <sup>P</sup>, axiom <sup>A</sup><sup>8</sup> subsumes axiom <sup>A</sup>3.

Finally, the idempotency of alternative composition in the case of bisimilarity over standard nondeterministic processes, i.e., P+P <sup>=</sup> P [11], changes depending on the considered equivalence:


**–** For ∼FRB, idempotency is formalized by axiom A10, where function to initial brings a process back to its initial version by removing all action decorations:

$$\begin{array}{l} \text{to } \textit{initial}(\underline{0}) = \underline{0} \\ \textit{to } \textit{initial}(a \cdot P) = a \cdot P \\ \textit{to } \textit{initial}(a^{\dagger} \cdot P) = a \cdot \textit{to } \textit{initial}(P) \\ \textit{to } \textit{initial}(P\_1 + P\_2) = \textit{to } \textit{initial}(P\_1) + \textit{to } \textit{initial}(P\_2) \end{array}$$

This axiom appeared for the first time in [16] and subsumes axioms A<sup>9</sup> and A<sup>6</sup> for ∼FB,ps as well as axiom A<sup>8</sup> for ∼RB.

To prove the ground completeness of the equational characterizations of the three considered bisimilarities, as usual we introduce equivalence-specific normal forms to which every process is shown to be reducible, then we work with normal forms only. All the three normal forms rely on the fact that alternative composition is associative and commutative, hence the binary + can be generalized to the n-ary - <sup>i</sup>∈<sup>I</sup> for a finite nonempty index set <sup>I</sup>. In the following, we denote by the deduction relation and we examine the sets of additional axioms below:

$$\begin{array}{l} -\mathcal{A}\_{\text{FB},\text{ps}} = \{\mathcal{A}\_{1}, \mathcal{A}\_{2}, \mathcal{A}\_{3}, \mathcal{A}\_{4}, \mathcal{A}\_{5}, \mathcal{A}\_{6}, \mathcal{A}\_{9}\}. \\ -\mathcal{A}\_{\text{RB}} = \{\mathcal{A}\_{1}, \mathcal{A}\_{2}, \mathcal{A}\_{7}, \mathcal{A}\_{8}\}. \\ -\mathcal{A}\_{\text{FRB}} = \{\mathcal{A}\_{1}, \mathcal{A}\_{2}, \mathcal{A}\_{3}, \mathcal{A}\_{10}\}. \end{array}$$

**Definition 5.** We say that <sup>P</sup> <sup>∈</sup> <sup>P</sup> is in <sup>∼</sup>FB,ps-normal form, written <sup>∼</sup>FB,ps-nf, iff it is equal to one of the following:

**–** 0. **–** - <sup>i</sup>∈<sup>I</sup> <sup>a</sup><sup>i</sup> . Pi, where each <sup>P</sup><sup>i</sup> is initial and in <sup>∼</sup>FB,ps-nf. **–** a†. P, where P is initial and in ∼FB,ps-nf.

All initial processes without 0 summands are in ∼FB,ps-nf. We observe that, in the second case, a<sup>1</sup> . P<sup>1</sup> ∼FB,ps a<sup>2</sup> . P<sup>2</sup> trivially implies a<sup>1</sup> = a<sup>2</sup> and P<sup>1</sup> ∼FB,ps P2. Likewise, in the third case, a† <sup>1</sup> . P<sup>1</sup> ∼FB,ps a† <sup>2</sup> . P<sup>2</sup> trivially implies P<sup>1</sup> ∼FB,ps P2. These facts will be exploited in the proof of the forthcoming Theorem 2.

$$\text{Lemma 1.}\text{ For all } P \in \mathbb{P} \text{ there is } Q \in \mathbb{P} \text{ in } \sim\_{\text{FB}, \text{ps}} \text{-}nf \text{ such that } \mathcal{A}\_{\text{FB}, \text{ps}} \vdash P = Q.$$

**Theorem 2.** Let <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>. Then <sup>P</sup><sup>1</sup> <sup>∼</sup>FB,ps <sup>P</sup><sup>2</sup> iff <sup>A</sup>FB,ps <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup>2.

**Definition 6.** We say that <sup>P</sup> <sup>∈</sup> <sup>P</sup> is in <sup>∼</sup>RB-normal form, written <sup>∼</sup>RB-nf, iff it is equal to one of the following:

$$\begin{array}{c} \stackrel{\scriptstyle \!\!\! -} \underline{\mathbf{Q}}. \\ \stackrel{\scriptstyle \!\!\! -} a^{\dagger}. P, \text{ where } P \text{ is } in \sim\_{\text{RB}} \text{-} nf. \end{array}$$

The normal form above boils down to a final process consisting of a possibly empty, finite sequence of already executed actions terminated by 0. As a consequence, a† <sup>1</sup> . P<sup>1</sup> ∼RB a† <sup>2</sup> . P<sup>2</sup> with P<sup>1</sup> and P<sup>2</sup> in ∼RB-nf implies a<sup>1</sup> = a<sup>2</sup> and P<sup>1</sup> ∼RB P2, because a† <sup>1</sup> . P<sup>1</sup> and a† <sup>2</sup> . P<sup>2</sup> must feature the same sequence of already executed actions and the last executed action of P<sup>1</sup> (resp. P2), when the process is different from 0, is the same as the last executed action of a† <sup>1</sup> . P<sup>1</sup> (resp. a† <sup>2</sup> . P2). This fact will be exploited in the proof of the forthcoming Theorem 3.

**Lemma 2.** *For all* <sup>P</sup> <sup>∈</sup> <sup>P</sup> *there is* <sup>Q</sup> <sup>∈</sup> <sup>P</sup> *in* <sup>∼</sup>RB*-nf such that* <sup>A</sup>RB <sup>P</sup> <sup>=</sup> <sup>Q</sup>*.*

**Theorem 3.** *Let* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>*. Then* <sup>P</sup><sup>1</sup> <sup>∼</sup>RB <sup>P</sup><sup>2</sup> *iff* <sup>A</sup>RB <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup><sup>2</sup>*.*

**Definition 7.** *We say that* <sup>P</sup> <sup>∈</sup> <sup>P</sup> *is in* <sup>∼</sup>FRB-normal form*, written* <sup>∼</sup>FRB*-nf, iff it is equal to one of the following:*

**–** <sup>0</sup>*.* **–** - i∈I <sup>a</sup><sup>i</sup> . Pi*, where each* <sup>P</sup><sup>i</sup> *is initial and in* <sup>∼</sup>FRB*-nf.* **–** <sup>a</sup>†. P*, where* <sup>P</sup> *is in* <sup>∼</sup>FRB*-nf.* **–** <sup>a</sup>†. P <sup>+</sup> - i∈I <sup>a</sup><sup>i</sup> . Pi*, where* <sup>P</sup> *is in* <sup>∼</sup>FRB*-nf and each* <sup>P</sup><sup>i</sup> *is initial and in* <sup>∼</sup>FRB*-nf.*

As for the second case above, which is concerned with initial processes, we observe that <sup>a</sup><sup>1</sup> . P<sup>1</sup> <sup>∼</sup>FRB <sup>a</sup><sup>2</sup> . P<sup>2</sup> trivially implies <sup>a</sup><sup>1</sup> <sup>=</sup> <sup>a</sup><sup>2</sup> and <sup>P</sup><sup>1</sup> <sup>∼</sup>FRB <sup>P</sup>2. The last two cases together, which are concerned with non-initial processes, yield a process consisting of a finite sequence of already executed actions terminated by an initial process, such that every action in the sequence may have an initial process as an alternative. As a consequence, a† <sup>1</sup> . P<sup>1</sup> + P <sup>1</sup> <sup>∼</sup>FRB <sup>a</sup>† <sup>2</sup> . P<sup>2</sup> + P 2 with P1, P2, P <sup>1</sup>, P <sup>2</sup> in <sup>∼</sup>FRB-nf, <sup>P</sup> <sup>1</sup> and P <sup>2</sup> initial, and P <sup>1</sup> and P <sup>2</sup> moving only when going back to *to initial*(a† <sup>1</sup> . P1) and *to initial*(a† <sup>2</sup> . P2), implies a<sup>1</sup> = a2, <sup>P</sup><sup>1</sup> <sup>∼</sup>FRB <sup>P</sup>2, and <sup>P</sup> <sup>1</sup> <sup>∼</sup>FRB <sup>P</sup> <sup>2</sup>. These facts will be exploited in the proof of the forthcoming Theorem 4.

**Lemma 3.** *For all* <sup>P</sup> <sup>∈</sup> <sup>P</sup> *there is* <sup>Q</sup> <sup>∈</sup> <sup>P</sup> *in* <sup>∼</sup>FRB*-nf such that* <sup>A</sup>FRB <sup>P</sup> <sup>=</sup> <sup>Q</sup>*.*

**Theorem 4.** *Let* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>*. Then* <sup>P</sup><sup>1</sup> <sup>∼</sup>FRB <sup>P</sup><sup>2</sup> *iff* <sup>A</sup>FRB <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup><sup>2</sup>*.*

## **3 The Markovian Case**

In this section, we repeat the investigation over Markovian reversible processes. We start by recalling the theory of continuous-time Markov chains (Section 3.1) including time reversibility (Section 3.2) and lumpability (Section 3.3), then we introduce syntax and semantics for these processes (Section 3.4), we provide the definitions of the three equivalences (Section 3.5), and we study their congruence properties and equational characterizations (Section 3.6).

## **3.1 Markov Chains: Definition, Representation, Terminology**

A Markov chain is a discrete-state stochastic process characterized by the *memoryless property* [14]. More precisely, a stochastic process <sup>X</sup>(t), <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, over a discrete state space <sup>S</sup> is a *continuous-time Markov chain (CTMC)* iff for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, time instants <sup>t</sup><sup>0</sup> < t<sup>1</sup> <sup>&</sup>lt; ··· < tn < tn+1 <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, and states <sup>s</sup>0, s1,...,sn, sn+1 ∈ S it holds that Pr{X(tn+1) = <sup>s</sup>n+1 <sup>|</sup> <sup>X</sup>(ti) = <sup>s</sup>i, <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>} = Pr{X(tn+1) = <sup>s</sup>n+1 <sup>|</sup> <sup>X</sup>(tn) = <sup>s</sup>n}, i.e., the probability of moving from one state to another does not depend on the particular path that has been followed in the past to reach the current state, hence that path can be forgotten.

A CTMC is representable as a labeled transition system or as a state-indexed matrix. In the first case, each transition is labeled with some probabilistic information describing the evolution from the source state to the target state of the transition. In the second case, the same information is stored into an entry, indexed by those two states, of a matrix. The value of this probabilistic information is a function of the time at which the state change takes place.

For the sake of simplicity, we restrict ourselves to time-homogeneous CTMCs, in which conditional probabilities of the form Pr{X(t + t - ) = s- | X(t) = s} do not depend on t, so that the considered information is simply a positive real number given by lim<sup>t</sup>-→0 Pr{X(t+t- )=s- |X(t)=s} t- . This is called the rate at which the CTMC moves from state s to state s and uniquely characterizes the exponentially distributed time taken by the considered move.

A CTMC is irreducible iff each of its states is reachable from every other state with probability greater than 0. A state s ∈ S is recurrent iff the CTMC will eventually return to s with probability 1, in which case s is positive recurrent iff the expected number of steps until the CTMC returns to it is finite. A CTMC is ergodic iff it is irreducible and all of its states are positive recurrent; ergodicity coincides with irreducibility in the case that the CTMC has finitely many states.

Every time-homogeneous and ergodic CTMC X(t) is stationary, which means that (X(t<sup>i</sup> + t - ))<sup>1</sup>≤i≤<sup>n</sup> has the same joint distribution as (X(ti))<sup>1</sup>≤i≤<sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>≥<sup>1</sup> and <sup>t</sup><sup>1</sup> <sup>&</sup>lt; ··· < tn, t- <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>. In this case, <sup>X</sup>(t) has a unique steady-state probability distribution *π* that for all s ∈ S fulfills π(s) = lim<sup>t</sup>→∞ Pr{X(t) = s | X(0) = s- } for any s- ∈ S. These probabilities can be computed by solving the linear system of global balance equations *π* · **Q** = **0** subject to - <sup>s</sup>∈S <sup>π</sup>(s)=1 and <sup>π</sup>(s) <sup>∈</sup> <sup>R</sup>><sup>0</sup> for all <sup>s</sup> ∈ S. The infinitesimal generator matrix **<sup>Q</sup>** contains for each pair of distinct states the rate of the corresponding move, which is 0 in the absence of a direct move between them, while qs,s = −- s-=<sup>s</sup> <sup>q</sup>s,s for all s ∈ S, i.e., every diagonal element contains the opposite of the total exit rate of the corresponding state, so that each row of **Q** sums up to 0.

#### **3.2 Time Reversibility of Continuous-Time Markov Chains**

Due to state space explosion and numerical stability problems [27], the calculation of the solution of the global balance equation system is not always feasible. However, it can be tackled in the case that the behavior of the considered CTMC remains the same when the direction of time is reversed. A CTMC X(t) is time reversible iff (X(ti))<sup>1</sup>≤i≤<sup>n</sup> has the same joint distribution as (X(t - − ti))<sup>1</sup>≤i≤<sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>≥<sup>1</sup> and <sup>t</sup><sup>1</sup> <sup>&</sup>lt; ··· < tn, t- <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>. In this case, <sup>X</sup>(t) and its timereversed version X<sup>r</sup> (t) = X(t - − t) are stochastically identical, in particular they are stationary and share the same steady-state probability distribution *π*. In order for a stationary CTMC X(t) to be time reversible, it is necessary and sufficient that the partial balance equations π(s)· qs,s- = π(s- )· q<sup>s</sup>-,s are satisfied for all s, s- ∈ S such that s = s or, equivalently, that q<sup>s</sup>1,s<sup>2</sup> ·...·q<sup>s</sup>n−1,s<sup>n</sup> ·q<sup>s</sup>n,s<sup>1</sup> = <sup>q</sup><sup>s</sup>1,s<sup>n</sup> · <sup>q</sup><sup>s</sup>n,sn−<sup>1</sup> · ... · <sup>q</sup><sup>s</sup>2,s<sup>1</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>≥<sup>2</sup> and distinct <sup>s</sup>1,...,s<sup>n</sup> ∈ S [13].

The time-reversed version X<sup>r</sup> (t) of a stationary CTMC X(t) can be defined even when X(t) is not reversible. As shown in [13,10], this is accomplished by using the steady-state probability distribution *π* of X(t), with X<sup>r</sup> (t) turning out to be a CTMC too and having the same steady-state probability distribution *π*. More precisely, q<sup>r</sup> <sup>s</sup><sup>j</sup> ,s<sup>i</sup> = q<sup>s</sup>i,s<sup>j</sup> · π(si)/π(s<sup>j</sup> ) for all s<sup>i</sup> -= s<sup>j</sup> , i.e., the rate from state s<sup>j</sup> to state s<sup>i</sup> in the time-reversed CTMC is proportional to the rate from state s<sup>i</sup> to state s<sup>j</sup> in the original CTMC, where the coefficient is given by the ratio of π(si) to π(s<sup>j</sup> ). Note that the time-reversed version of X<sup>r</sup> (t) is X(t).

#### **3.3 Lumpability of Continuous-Time Markov Chains**

A different approach to the state space explosion problem consists of aggregating states and transitions in a suitable way. In particular, the focus is on exact aggregations, i.e., partitions of the state space such that the probability of being in any of the aggregated states is equal to the sum of the probabilities of the original states it contains. In the following, we consider a time-homogeneous CTMC X(t) with state space S and infinitesimal generator matrix **Q**; the formulas for the elements of the matrix of the resulting aggregations are taken from [2].

The first notion of exact aggregation that we address is strong lumpability [14]. It was later renamed ordinary lumpability in [28,5], which we prefer to adopt so as not to generate confusion with the use of strong and weak for behavioral equivalences in concurrency theory.

**Definition 8.** The partition P induced by an equivalence relation L over S is an ordinary lumping iff for all - (s1, s2) ∈ L and C ∈ P such that s1, s<sup>2</sup> ∈/ C: <sup>∈</sup><sup>C</sup> <sup>q</sup><sup>s</sup>1,s- = - <sup>∈</sup><sup>C</sup> <sup>q</sup><sup>s</sup>2,s-

ss-The resulting CTMC with state space P has infinitesimal generator matrix **Q** defined as follows for all C1, C<sup>2</sup> ∈ P such that C<sup>1</sup> -= C2:

$$q'\_{C\_1, C\_2} = \sum\_{s' \in C\_2} q\_{s, s'}$$

where s ∈ C1.

The second notion of exact aggregation is exact lumpability [25,28,5], which further enjoys the property that all the original states contained in the same aggregated state have the same probability. While ordinary lumpability considers the rates of outgoing transitions and does not check for rate equality within any class, exact lumpability considers the rates of incoming transitions and applies the rate equality check inside each class too.

**Definition 9.** The partition P induced by an equivalence relation L over S is an exact lumping iff for all (s1, s2) ∈ L and C ∈ P:

s-<sup>∈</sup><sup>C</sup> <sup>q</sup><sup>s</sup>-,s<sup>1</sup> = s-<sup>∈</sup><sup>C</sup> <sup>q</sup><sup>s</sup>-,s<sup>2</sup> The resulting CTMC with state space P has infinitesimal generator matrix **Q** defined as follows for all C1, C<sup>2</sup> ∈ P such that C<sup>1</sup> -= C2: 

$$q'\_{C\_1, C\_2} = \sum\_{s' \in C\_1} q\_{s', s} \cdot (|C\_2|/|C\_1|)$$

where s ∈ C2.

The third notion of exact aggregation is strict lumpability [5], which is a combination of the previous two.

**Definition 10.** *The partition* P *induced by an equivalence relation* L *over* S *is a* strict lumping *iff it is both an ordinary lumping and an exact lumping.*

The relationships between lumpability and time reversibility for CTMCs have been investigated in [18,19]:


*Example 3.* Consider the three time-reversible, ergodic CTMCs depicted below:

When solving the global balance equations for the first CTMC from the left, we obtain: <sup>π</sup>(s0) = <sup>μ</sup>1·μ<sup>2</sup>

$$\begin{array}{c} \pi(s\_0) = \frac{\mu\_1 \cdot \mu\_2}{\mu\_1 \cdot \mu\_2 + \lambda\_1 \cdot \mu\_2 + \lambda\_2 \cdot \mu\_1} \\ \pi(s\_1) = \frac{\lambda\_1 \cdot \mu\_2}{\mu\_1 \cdot \mu\_2 + \lambda\_1 \cdot \mu\_2 + \lambda\_2 \cdot \mu\_1} \\ \pi(s\_2) = \frac{\lambda\_2 \cdot \mu\_1}{\mu\_1 \cdot \mu\_2 + \lambda\_1 \cdot \mu\_2 + \lambda\_2 \cdot \mu\_1} \end{array}$$

If λ<sup>1</sup> = λ<sup>2</sup> but μ<sup>1</sup> -= μ2, then no exact aggregation exists for that CTMC. If μ<sup>1</sup> = μ<sup>2</sup> μ but λ<sup>1</sup> -= λ2, then the second CTMC from the left is an ordinary lumping of the first one, where the aggregated state s contains the two original states s<sup>1</sup> and s<sup>2</sup> and the solution of the global balance equations is the following:

$$\begin{array}{rcl} \pi(s'\_0) = \frac{\mu}{\mu + \lambda\_1 + \lambda\_2} = \pi(s\_0) \\ \pi(s') = \frac{\lambda\_1 + \lambda\_2}{\mu + \lambda\_1 + \lambda\_2} = \pi(s\_1) + \pi(s\_2) \end{array}$$

with π(s1) -= π(s2).

If λ<sup>1</sup> = λ<sup>2</sup> λ and μ<sup>1</sup> = μ<sup>2</sup> μ, then the third CTMC from the left is a strict – i.e., ordinary and exact – lumping of the first one, where the aggregated state s-- contains the two original states s<sup>1</sup> and s<sup>2</sup> and the solution of the global balance equations is the following:

$$\begin{array}{c} \bar{\pi}(s\_0'') = \frac{\mu}{\mu + 2 \cdot \lambda} = \pi(s\_0) \\ \pi(s'') = \frac{2 \cdot \lambda}{\mu + 2 \cdot \lambda} = \pi(s\_1) + \pi(s\_2) \\ \pi(s\_2). \end{array}$$

with π(s1) = π(s2).

*Example 4.* The considered notions of lumpability are distinct from each other. On the one hand, in the previous example the second CTMC from the left is an ordinary lumping of the first one, but not an exact lumping as π(s1) -= π(s2) when μ<sup>1</sup> = μ<sup>2</sup> and λ<sup>1</sup> -= λ2. On the other hand, the CTMC on the right depicted below is an exact lumping of the CTMC on the left – where the aggregated state s contains the two original states s<sup>1</sup> and s<sup>2</sup> – when μ- + μ-- = ν- + ν-- – corresponding to q<sup>s</sup>1,s<sup>1</sup> +q<sup>s</sup>2,s<sup>1</sup> = q<sup>s</sup>1,s<sup>2</sup> +q<sup>s</sup>2,s<sup>2</sup> , i.e., −(μ-+μ--)+0 = 0−(ν-+ν--) – but it is not an ordinary lumping if μ- -= ν and μ-- -= ν--:

Note that the two CTMCs above are ergodic, but not time reversible.

#### **3.4 Syntax and Semantics of Markovian Reversible Processes**

We have seen in Section 2 that a single forward transition relation is enough for nondeterministic processes in a reversible setting. This is due to the fact that <sup>P</sup> <sup>a</sup>−→ <sup>P</sup> iff P a −- P, where according to [24] the backward transition relation − should be used in the second clause of the definition of ∼FRB and hence in the definition of ∼RB as well.

A transition relation in a single direction is no longer sufficient in the case of Markovian reversible processes. The reason is that every transition of these processes is also labeled with its rate, a positive real number that uniquely identifies the exponentially distributed duration of the action associated with the transition. In general, the rate may be different depending on whether the transition goes forward or backward, without necessarily affecting time reversibility.

When moving from nondeterministic reversible processes to Markovian ones, in the syntax we thus need to replace a and a† with <a, λ, μ> and <a†, λ, μ> respectively, where <sup>λ</sup> <sup>∈</sup> <sup>R</sup>><sup>0</sup> is the rate of the forward <sup>a</sup>-transition whilst <sup>μ</sup> <sup>∈</sup> <sup>R</sup>><sup>0</sup> is the rate of the backward a-transition. Predicates *initial*, *final*, and *reachable* are extended accordingly and the set of reachable processes is denoted by PM.

In order for the semantics to be consistent with the CTMC theory recalled in Sections 3.1 to 3.3, we cannot use a transition relation −→ with forward rates separated from a transition relation <sup>−</sup> with backward rates, as would be the case if we applied the approach of [24]. For instance, the two Markovian processes depicted below would be identified by a Markovian variant of ∼FRB relying on −→ and <sup>−</sup>-, but the CTMC underlying the labeled transition system of the process on the right is not an exact lumping of the CTMC underlying the labeled transition system of the process on the left if λ<sup>1</sup> -= λ2, i.e., this Markovian variant of ∼FRB would not induce strict lumping:


**Table 3.** Operational semantic rules for Markovian reversible processes

We thus keep using a single transition relation, which is −→<sup>M</sup> <sup>⊆</sup> <sup>P</sup><sup>M</sup> <sup>×</sup> (<sup>A</sup> <sup>×</sup> <sup>R</sup>><sup>0</sup>) <sup>×</sup> <sup>P</sup><sup>M</sup> defined in Table 3. Unlike the one in Section 2.2, it embodies both transitions with forward rates and transitions with backward rates. This has been accomplished not only by extending all the rules in Table 1 according to the new richer syntax, but also by adding a rule for action prefix (Act<sup>r</sup> where r stands for reverse) that generates transitions with backward rates.

Any state corresponding to a process different from 0 can now have several incoming transitions too. The labeled transition system underlying an initial process turns out to be a tree-like extension of a birth-death process [23,21], with branching points corresponding to occurrences of +. The reason is that between any pair of connected states there can only be a transition from the former state to the latter and a transition from the latter state back to the former, with the two transitions sharing the same name as they are generated by the same action <a, λ, μ>. The underlying CTMC, obtained by removing actions from transitions, turns out to be not only ergodic, but also time reversible due to its tree-like birth-death structure [13]. The considered calculus thus combines causality-consistent reversibility with time reversibility like in [4].

*Example 5.* The labeled transition systems generated by the rules in Table 3 for the two Markovian processes <a, λ, μ> . 0 + <a, λ, μ> . 0 and <a, λ, μ> . 0 are shown below:

The generation of a single a-transition from <a, λ, μ> . 0 + <a, λ, μ> . 0 on the left would have been wrong, as it would have not reflected the total exit rate 2·λ of the source state. Several solutions to this problem have been proposed for Markovian process calculi without reversibility, while in our setting the problem is naturally prevented by action decorations within processes.

#### **3.5 Bisimilarities for Markovian Reversible Processes**

We now define the Markovian variants of forward bisimilarity, reverse bisimilarity, and forward-reverse bisimilarity based on the CTMC theory recalled in Sections 3.1 to 3.3.

In the forward case, it is known that the (discrete-time) probabilistic bisimilarity of [17] and the (continuous-time) Markovian bisimilarity of [12] induce an ordinary lumping on the Markov chains underlying the considered processes, hence so does ∼MFB below. Unlike Definition 8, in Definition 11 the rate equality check is applied inside each class too and hence not all ordinary lumpings can be induced by ∼MFB, in particular not the one identifying every pair of processes.

The reason is that while in Markov chain theory one is interested in state probabilities, in concurrency theory one experiments with processes by observing the labels of the transitions that are executed [9,1,17]. In particular, two processes with different total exit rates cannot be identified by ∼MFB below, which is perfectly justifiable from an observational viewpoint. As an example, consider a state with a self-looping λ-transition and a state with a self-looping μ-transition. The two states would be deemed ordinarily lumpable according to Definition 8, although the more λ and μ are different, the easier it is for an observer to tell those two states apart.

In the following, {| and |} denote multiset parentheses, while <sup>P</sup>M/<sup>B</sup> is the set of equivalence classes induced by the equivalence relation <sup>B</sup> over <sup>P</sup>M.

**Definition 11.** *We say that* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup><sup>M</sup> *are* Markovian forward bisimilar*, written* <sup>P</sup><sup>1</sup> <sup>∼</sup>MFB <sup>P</sup>2*, iff* (P1, P2) ∈ B *for some Markovian forward bisimulation* <sup>B</sup>*. An equivalence relation* <sup>B</sup> *over* <sup>P</sup><sup>M</sup> *is a* Markovian forward bisimulation *iff for all* (P1, P2) ∈ B*,* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*, and* <sup>C</sup> <sup>∈</sup> <sup>P</sup>M/B*:*

$$\begin{aligned} \operatorname{rate}\_{\text{out}}(P\_1, a, C) &= \operatorname{rate}\_{\text{out}}(P\_2, a, C) \\ \text{(7)} &= \sum \{ \xi \in \mathbb{R}\_{>0} \mid \exists P' \in C. P \xrightarrow{a, \xi}\_{\text{M}} P' \}. \end{aligned}$$

*where rate*out(P, a, C) = -{| <sup>ξ</sup> <sup>∈</sup> <sup>R</sup>><sup>0</sup> | ∃P-<sup>∈</sup> C. P a,ξ

In the reverse case, incoming transitions are considered instead of outgoing ones. As in [6,26], in the definition of ∼MRB below an additional condition about total exit rate equality is needed, which in Definition 9 is naturally handled through the diagonal elements of the infinitesimal generator matrix. It is easily seen that ∼MRB induces an exact lumping on the Markov chains underlying the considered processes, but not all exact lumpings can be induced.

**Definition 12.** *We say that* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup><sup>M</sup> *are* Markovian reverse bisimilar*, written* <sup>P</sup><sup>1</sup> <sup>∼</sup>MRB <sup>P</sup>2*, iff* (P1, P2) ∈ B *for some Markovian reverse bisimulation* <sup>B</sup>*. An equivalence relation* <sup>B</sup> *over* <sup>P</sup><sup>M</sup> *is a* Markovian reverse bisimulation *iff for all* (P1, P2) ∈ B *and* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*:*

$$rate\_{\text{out}}(P\_1, a, \mathbb{P}\_{\text{M}}) = \textit{rate}\_{\text{out}}(P\_2, a, \mathbb{P}\_{\text{M}})$$

*and for all* <sup>C</sup> <sup>∈</sup> <sup>P</sup>M/B*:*

$$\operatorname{rate}\_{\text{in}}(P\_1, a, C) \;= \operatorname{rate}\_{\text{in}}(P\_2, a, C)$$

*where rate*in(P, a, C) = -{| <sup>ξ</sup> <sup>∈</sup> <sup>R</sup>><sup>0</sup> | ∃P- <sup>∈</sup> C. P a,ξ −→<sup>M</sup> <sup>P</sup> |}*.*

In the forward-reverse case, ∼MFRB below induces a strict lumping on the Markov chains underlying the considered processes.

**Definition 13.** *We say that* <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup><sup>M</sup> *are* Markovian forward-reverse bisimilar*, written* <sup>P</sup><sup>1</sup> <sup>∼</sup>MFRB <sup>P</sup>2*, iff* (P1, P2) ∈ B *for some Markovian forward-reverse bisimulation* <sup>B</sup>*. An equivalence relation* <sup>B</sup> *over* <sup>P</sup><sup>M</sup> *is a* Markovian forwardreverse bisimulation *iff for all* (P1, P2) ∈ B*,* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*, and* <sup>C</sup> <sup>∈</sup> <sup>P</sup>M/B*:*

$$\begin{array}{ll}\dot{rate\_{\text{out}}}(\dot{P\_1},a,C) & = \dot{rate\_{\text{out}}}(\dot{P\_2},a,C) \\\dot{rate\_{\text{in}}}(P\_1,a,C) & = \dot{rate\_{\text{in}}}(P\_2,a,C) \end{array}$$

It is worth noting that any aggregated state resulting from an ordinary lumping is ∼MFB-equivalent to each of the original states it contains, while this is not necessarily the case for exact lumping and ∼MRB, where ∼MRB-equivalence certainly holds only among the original states contained in an aggregated state. This is the *fourth asymmetry* between forward and reverse bisimilarity.

*Example 6.* The three CTMCs of Example 3 can be viewed as underlying the labeled transition systems of the following three initial processes:


with the only exception of the following two contained in the same aggregate: <a†, λ, μ> . <sup>0</sup> <sup>+</sup> <a, λ, μ> . <sup>0</sup> <sup>∼</sup>MRB <a, λ, μ> . <sup>0</sup> <sup>+</sup> <a†, λ, μ> . <sup>0</sup>

Unlike ∼FB, it holds that ∼MFB is sensitive to the presence of the past, so that in Definition <sup>11</sup> it is not necessary to require *initial*(P1) ⇐⇒ *initial*(P2) to gain compositionality with respect to alternative composition. For example: <a†, λ, μ> . <b, δ, γ> . <sup>0</sup> ∼MFB <b, δ, γ> . <sup>0</sup>

because the process on the left has an outgoing a-transition with rate μ that cannot be matched by the process on the right.

Furthermore, unlike ∼FB,ps, it holds that ∼MFB cannot identify processes with a different past. For instance:

<a†, λ, μ> . <sup>0</sup> ∼MFB <b†, δ, γ> . <sup>0</sup>

whenever a -= b or μ -= γ, as in that case the outgoing a-transition on the left cannot be matched by the outgoing b-transition on the right.

Similarly, unlike ∼RB, we have that ∼MRB is sensitive to the presence of the future and cannot identify processes with a different future. As an example:

$$\left[ \begin{array}{c}  . \underline{0} \\ \cdot \text{.} \end{array} \not\supset \begin{array}{c} \not\perp\_{\text{MRB}} & \underline{0} \end{array} \right]$$

because the process on the left has an incoming a-transition with rate μ that cannot be matched by the process on the right. As another example:

> <a, λ, μ> . 0 -∼MRB <b, δ, γ> . 0

whenever a -= b or μ -= γ, as in that case the incoming a-transition on the left cannot be matched by the incoming b-transition on the right.

We conclude by showing that ∼MFRB coincides with ∼MRB (whilst ∼MFB is strictly coarser) thus extending the first asymmetry between forward and reverse bisimilarities (see page 5). This result stems from the definition of the operational semantics and the consequent time reversibility of the underlying CTMCs.

**Theorem 5.** Let <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>M. Then <sup>P</sup><sup>1</sup> <sup>∼</sup>MFRB <sup>P</sup><sup>2</sup> iff <sup>P</sup><sup>1</sup> <sup>∼</sup>MRB <sup>P</sup>2.

#### **3.6 Congruence Properties and Equational Characterizations**

We start by observing that ∼MFB is not totally sensitive to the past, in the same way as ∼MRB is not totally sensitive to the future. For both equivalences this results in a compositionality violation with respect to +. As an example:

<a, λ, λ> . 0 ∼MFRB <a†, λ, λ> . 0

<a, λ, λ> . 0 + <c, κ1, κ2> . 0 -∼MFRB <a†, λ, λ> . 0 + <c, κ1, κ2> . 0 because in <a†, λ, λ> . 0+<c, κ1, κ2> . 0 action c is disabled due to the presence of the already executed action a†, while in <a, λ, λ> . 0 + <c, κ1, κ2> . 0 action c is enabled as there are no past actions preventing it from occurring.

Note that ∼MFRB would not equate the first two processes if their two rates were λ<sup>1</sup> and λ<sup>2</sup> with λ<sup>1</sup> -= λ<sup>2</sup> or there were any other process in place of 0. Therefore, when investigating congruence with respect to alternative composition, we will consider the set of processes P- <sup>M</sup> <sup>=</sup> <sup>P</sup><sup>M</sup> \ {<a, λ, λ> . <sup>0</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> A, λ <sup>∈</sup> <sup>R</sup>>0}.

**Theorem 6.** Let <sup>∼</sup><sup>M</sup> ∈ {∼MFB, <sup>∼</sup>MRB} and <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>M:

	- <a, λ, μ> . P<sup>1</sup> ∼<sup>M</sup> <a, λ, μ> . P<sup>2</sup> provided that initial(P1) ∧ initial(P2). • <a†, λ, μ> . P<sup>1</sup> ∼<sup>M</sup> <a†, λ, μ> . P2.
	- P<sup>1</sup> + P ∼<sup>M</sup> P<sup>2</sup> + P and P + P<sup>1</sup> ∼<sup>M</sup> P + P<sup>2</sup> provided that initial(P) ∨ (initial(P1) ∧ initial(P2)).

With regard to equational characterizations, as expected ∼MFB and ∼MRB are such that alternative composition is associative and commutative and admits 0 as neutral element. This is formalized by axioms AM,<sup>1</sup> to AM,<sup>3</sup> in Table 4.

Markovian variants of axioms A<sup>4</sup> to A<sup>6</sup> in Table 2 are not valid for ∼MFB because this behavioral equivalence is sensitive to the presence of the past, cannot identify processes with a different past, and views all the transitions as outgoing.


**Table 4.** Axioms characterizing bisimilarity over Markovian reversible processes

Likewise, Markovian variants of axioms A<sup>7</sup> and A<sup>8</sup> in Table 2 are not valid for ∼MRB because this behavioral equivalence is sensitive to the presence of the future, cannot identify processes with a different future, and views all the transitions as incoming.

As for idempotency, Markovian variants of axioms A<sup>9</sup> and A<sup>10</sup> in Table 2, which are formalized by axioms AM,<sup>4</sup> and AM,<sup>5</sup> in Table 4, are valid only for ∼MFB as shown in Example 6. We further observe that in the considered example:

<a†, λ, μ> . 0 + <a, λ, μ> . 0 ∼MRB <a, λ, μ> . 0 + <a†, λ, μ> . 0 can be proved via axiom AM,2.

**Theorem 7.** Let <sup>A</sup>MFB <sup>=</sup> {AM,1, <sup>A</sup>M,2, <sup>A</sup>M,3, <sup>A</sup>M,4, <sup>A</sup>M,5} and <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>- M. Then <sup>P</sup><sup>1</sup> <sup>∼</sup>MFB <sup>P</sup><sup>2</sup> iff <sup>A</sup>MFB <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup>2.

**Theorem 8.** Let <sup>A</sup>MRB <sup>=</sup> {AM,1, <sup>A</sup>M,2, <sup>A</sup>M,3} and <sup>P</sup>1, P<sup>2</sup> <sup>∈</sup> <sup>P</sup>- <sup>M</sup>. Then <sup>P</sup><sup>1</sup> <sup>∼</sup>MRB <sup>P</sup><sup>2</sup> iff <sup>A</sup>MRB <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup>2.

## **4 Conclusions**

In this paper, we have discovered the following asymmetries that shed light on forward bisimilarity, reverse bisimilarity, and forward-reverse bisimilarity:


As future work, we plan to investigate logical characterizations of the same equivalences, along with what changes when admitting irreversible actions.

**Acknowledgments.** This research has been supported by the PRIN project *NiRvAna – Noninterference and Reversibility Analysis in Private Blockchains* as well as the INdAM-GNCS project *Propriet`a Qualitative e Quantitative di Sistemi Reversibili*.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Explainability of Probabilistic Bisimilarity Distances for Labelled Markov Chains*-*

Amgad Rady and Franck van Breugel()

DisCoVeri Group, Department of Electrical Engineering and Computer Science York University, Toronto, Canada franck@yorku.ca

Abstract. Probabilistic bisimilarity distances measure the similarity of behaviour of states of a labelled Markov chain. The smaller the distance between two states, the more alike they behave. Their distance is zero if and only if they are probabilistic bisimilar. Recently, algorithms have been developed that can compute probabilistic bisimilarity distances for labelled Markov chains with thousands of states within seconds. However, say we compute that the distance of two states is 0.125. How does one explain that 0.125 captures the similarity of their behaviour?

In this paper, we address this question by returning to the definition of probabilistic bisimilarity distances proposed by Desharnais, Gupta, Jagadeesan, and Panangaden more than two decades ago. We use a slight variation of their logic to construct for each pair of states a sequence of formulas that explains the probabilistic bisimilarity distance of the states. Furthermore, we present an algorithm that computes those formulas and we show that each formula can be computed in polynomial time.

We also prove that our logic is minimal. That is, if we leave out any operator from the logic, then the resulting logic no longer provides a logical characterization of the probabilistic bisimilarity distances.

## 1 Introduction

The behavioural equivalence *bisimilarity*, due to Milner [41] and Park [44], is one of the cornerstones of concurrency theory. It captures which states of a labelled transition system, a simple yet widely used model of concurrent systems, behave the same. Hennessy and Milner [29] provided a *logical characterization* of bisimilarity by introducing a logic, known as Hennessy-Milner logic, and proving that states are bisimilar if and only if they satisfy the same formulas of the logic. If the labelled transition system has finitely many states then for two states that are not bisimilar there exists a formula, often referred to as a *distinguishing formula*, such that one state satisfies the formula whereas the other state does not. This formula explains why the two states are not bisimilar. Cleaveland [12] presented a polynomial time algorithm that computes a distinguishing formula for states that are not bisimilar. Consider the following labelled transition system.

<sup>-</sup>Supported by the Natural Sciences and Engineering Research Council of Canada.

The states s and t are not bisimilar. This can be explained by a formula that expresses that a state can transition to a state that can subsequently transition to a purple (square) state as well as a green (hexagon) state. State s satisfies this formula but state t does not.

To model randomness in systems, *labelled Markov chains* are often used. Larsen and Skou [39] introduced *probabilistic bisimilarity* to capture which states of a labelled Markov chain behave the same. They also introduced a logic that characterizes probabilistic bisimilarity. Desharnais, Edalat, and Panangaden [19] simplified that logic and presented a polynomial time algorithm that produces a formula that distinguishes two states which are not probabilistic bisimilar. Consider the following labelled Markov chain.

The states s and t are not probabilistic bisimilar. State t can transition with more than probability <sup>1</sup> <sup>2</sup> to a green state that can transition to a purple state, whereas state s cannot. This property can be expressed in the logic, giving rise to a formula that distinguishes the states s and t.

Giacalone, Jou, and Smolka [27] observed that probabilistic bisimilarity is not robust. Miniscule changes to the probabilities may alter which states are probabilistic bisimilar. Instead of an equivalence relation, they suggested exploiting a *pseudometric* to capture the behavioural similarity of states. That is, each pair of states is assigned a distance, a real number in the interval [0, 1], which measures how similar the states behave. The smaller the distance, the more alike the states behave. Distance zero captures that the states are behaviourally equivalent.

Desharnais, Gupta, Jagadeesan, and Panangaden [20] presented such a pseudometric. They showed that distance zero captures probabilistic bisimilarity. Therefore, those distances are known as *probabilistic bisimilarity distances*. These distances can be computed in polynomial time, as has been shown by Chen et al. [11]. Tang [48] developed and implemented algorithms that can compute the probabilistic bisimilarity distances for labelled Markov chains with thousands of states within seconds. The states s and t in the above labelled Markov chain have distance 0.125. How does one explain that 0.125 captures the similarity of their behaviour? That is the main question that we address in this paper.

To define their probabilistic bisimilarity distances, Desharnais et al. introduce a logic. The labelled Markov chains that they consider differ slightly from the ones we study in this paper: they label transitions whereas we label states (by colours/shapes), and where we require that the probabilities of the outgoing transitions of a state add up to one, they allow them to sum to less than one as well. State-labelled Markov chains have become the norm in probabilistic model checking. Probabilistic model checkers such as PRISM [38] and Storm [14] consider state-labelled Markov chains. Since each transition-labelled Markov chain can be encoded as a state-labelled one [46], this difference does not substantially impact any of the results. If the probabilities do not sum to one, one can add an additional state and transition to that state with the remaining probability. Also this difference does not significantly change the results. Adjusted to our setting, slightly simplified, and using a different syntax, the logic can be captured by the following grammar:

$$\varphi ::= a \mid \neg \varphi \mid \varphi \land \varphi \mid \bigcirc \varphi \mid \varphi \ominus q \mid$$

where a is a label of a state and q is a rational in the interval [0, 1]. This logic characterizes the probabilistic bisimilarity distances (see, for example, [20,6]). Roughly speaking, the distance of two states is determined by a formula of the logic that distinguishes them the most. Such a formula explains their probabilistic bisimilarity distance. Consider, for example, the states s and t in the above labelled Markov chain. As we already mentioned, their distance is 0.125. This distance can be explained by the formula ( ∧ ). This formula captures the probability of reaching a green state in one transition and subsequently reaching a purple state after the second transition. For state s that probability is 0.5 and it is 0.625 for state t. Note that the operator is similar to the next operator of linear temporal logic. Roughly, the interpretation of the formula ϕ in state s is the probability that ϕ holds in the successors of s.

As is common, we provide the above logic with a real-valued interpretation. For a formula of the logic, its interpretation maps each state of the labelled Markov chain to a real value in the interval [0, 1]. For example, for the formula ( ∧ ), its interpretation in state <sup>s</sup> is denoted by -( ∧ )(s) and has the value 0.5. The value of -( ∧ )(t) is 0.625. Their difference, which is 0.125, is the distance of the states s and t. The distinguishing formula for the states s and t is fairly simple. As we will discuss next, we need all the operators of the logic to explain the probabilistic bisimilarity distances and a single formula may not suffice.

#### 1.1 Main Results

As we will show, the above logic is a *minimal* logic that characterizes the probabilistic bisimilarity distances. That is, if we remove any operator from the logic then the resulting logic does not characterize the probabilistic bisimilarity distances anymore. Furthermore, we will demonstrate that there exist finite labelled Markov chains for which the distances of some states cannot be explained by

a single formula. However, as we will prove, we can explain the probabilistic bisimilarity distances by means of a sequence of formulas. Given two states, say u and v, we will construct a sequence of formulas ϕ<sup>0</sup> uv, ϕ<sup>1</sup> uv, ϕ<sup>2</sup> uv, . . . such that the sequence - ϕ0 uv (u) <sup>−</sup> - ϕ0 uv (v), - ϕ1 uv (u) <sup>−</sup> - ϕ1 uv (v), - ϕ2 uv (u) <sup>−</sup> - ϕ2 uv (v), . . . converges to the probabilistic bisimilarity distance of u and v. We will also present an algorithm that computes those formulas and we will show that each formula can be computed in polynomial time.

## 1.2 Related Work

In addition to the references to the literature mentioned above, next we will discuss some other related work. Many of the behavioural equivalences have been characterized logically. For example, Feng and Zhang [25] provide a logical characterization of probabilistic bisimilarity for probabilistic automata. Bernardo and Miculan [4] present an algorithm that builds a distinguishing formula for states of a probabilistic automaton that are not probabilistic bisimilar. König, Mika-Michalski, and Schröder [37] propose a general method to construct a distinguishing formula for a variety of systems, including probabilistic automata.

Behavioural pseudometrics have been introduced for a large variety of systems that model randomness. For example, Ferns, Panangaden, and Precup [26] study probabilistic bisimilarity distances for Markov decision processes, Deng, Chothia, Palamidessi, and Pang [15] introduce them for probabilistic automata, and De Alfaro, Majumdar, Raman, and Stoelinga [1] present them for games.

Also many behavioural pseudometrics have been characterized logically. For example, Desharnais, Laviolette, and Tracol [23] present a logical characterization of ε-bisimilarity, a notion closely related to distances, for probabilistic automata. Du, Deng, and Gebler [24] logically characterize probabilistic bisimilarity distances for probabilistic automata. Pantelic and Lawford [43] provide a logical characterization of a behavioural pseudometric for probabilistic discrete event structures. Komorida et al. [35], König and Mika-Michalski [36], Wild and Schröder [51], as well as Wißmann, Milius, and Schröder [52], present general frameworks to obtain logical characterizations of behavioural pseudometrics.

Whereas many logics for systems with randomness have a real-valued interpretation, Castiglione, Gebler, and Tini [9,10] introduce a logic for probabilistic automata with a boolean-valued interpretation. Their logic contains an operator with which we can express properties such as "a state can transition with probability a half to a purple state and with probability a half to a green state." It is this operator that allows them to define a mimicking formula of a state. As the name suggests, this formula mimics the behaviour of the state. Furthermore, they endow the formulas with a pseudometric and show that the probabilistic bisimilarity distance of two states is the distance of their mimicking formulas. Hence, the distance of two states can be explained by means of the mimicking formulas of those states.

## 2 Labelled Markov Chains and Probabilistic Bisimilarity Distances

In this section, we introduce several key notions that play a central role in the remainder of the paper. We define the model of interest, namely a labelled Markov chain. Furthermore, we introduce probabilistic bisimilarity, an equivalence relation that captures which states of a labelled Markov chain behave the same, and probabilistic bisimilarity distances, which measure the similarity of behaviour of those states.

First, we recall some notions from probability theory. Given a finite set X, a function μ : X <sup>→</sup> [0, 1] is a *probability distribution* on X if - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x)=1. We denote the set of probability distributions on X by <sup>D</sup><sup>R</sup>(X). For <sup>μ</sup> ∈ D<sup>R</sup>(X) and A <sup>⊆</sup> X, we often write μ(A) for - <sup>x</sup>∈<sup>A</sup> <sup>μ</sup>(x). Similarly, for <sup>ω</sup> ∈ D<sup>R</sup>(<sup>X</sup> <sup>×</sup> <sup>X</sup>), a <sup>∈</sup> X, and A <sup>⊆</sup> X, we usually write ω(a, A) for - <sup>x</sup>∈<sup>A</sup> <sup>ω</sup>(a, x). For <sup>μ</sup> ∈ D<sup>R</sup>(X), we define the *support* of μ by support(μ) = { x <sup>∈</sup> X <sup>|</sup> μ(x) > <sup>0</sup> }. A probability distribution μ ∈ D<sup>R</sup>(X) is *rational* if <sup>μ</sup>(x) <sup>∈</sup> <sup>Q</sup> for all <sup>x</sup> <sup>∈</sup> <sup>X</sup>. We denote the set of rational probability distributions on <sup>X</sup> by <sup>D</sup><sup>Q</sup>(X). Obviously, <sup>D</sup><sup>Q</sup> ⊆ D<sup>R</sup>.

Definition 1. *<sup>A</sup>* labelled Markov chain *is a tuple* S, L, τ, *consisting of*


We restrict the transition probabilities to rationals as we will compute with them in Section 6 and 7. For the remainder, we fix a labelled Markov chain S, L, τ, . We define probabilistic bisimlarity by means of the set Ω<sup>R</sup>(μ, ν) which is known as the *transportation polytope* [33] of the probability distributions μ and ν.

Definition 2. *For all* μ*,* ν ∈ D<sup>R</sup>(S)*, the set* <sup>Ω</sup><sup>R</sup>(μ, ν) *is defined by*

<sup>Ω</sup><sup>R</sup>(μ, ν) = { ω ∈ D<sup>R</sup>(S <sup>×</sup> S) | ∀s <sup>∈</sup> S : ω(s, S) = μ(s) <sup>∧</sup> ω(S, s) = ν(s) }.

Definition 3. *A relation* R <sup>⊆</sup> S <sup>×</sup> S *is a* probabilistic bisimulation *if for all* (s, t) <sup>∈</sup> <sup>R</sup>*,* (s) = (t) *and there exists* <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup><sup>R</sup>(τ (s), τ (t)) *with* support(ω) <sup>⊆</sup> R*. States* s *and* t *are* probabilistic bisimilar*, denoted* s <sup>∼</sup> t*, if* (s, t) <sup>∈</sup> R *for some probabilistic bisimulation* R*.*

To define the probabilistic bisimilarity distances, it is convenient to partition the set of state pairs into the following three sets.

Definition 4. *The sets* S<sup>2</sup> <sup>0</sup> *,* <sup>S</sup><sup>2</sup> <sup>1</sup> *and* <sup>S</sup><sup>2</sup> ? *are defined by*

$$\begin{aligned} S\_0^2 &= \{ (s, t) \in S \times S \mid s \sim t \} \\ S\_1^2 &= \{ (s, t) \in S \times S \mid \ell(s) \neq \ell(t) \} \\ S\_?^2 &= (S \times S) \backslash (S\_0^2 \cup S\_1^2) \end{aligned}$$

The set S<sup>2</sup> <sup>0</sup> contains those state pairs that have distance zero (cf. Theorem 6). The set S<sup>2</sup> <sup>1</sup> contains those state pairs that have a different label and, therefore, have distance one (cf. Definition 5). The set S<sup>2</sup> ? contains the remaining state pairs. Note that some of these state pairs may have distance one, but cannot have distance zero. The probabilistic bisimilarity distances are defined in terms of the following function.

Definition 5. The function Δ : (S × S → [0, 1]) → (S × S → [0, 1]) is defined by

$$\Delta(d)(s,t) = \begin{cases} 0 & \text{if } (s,t) \in S\_0^2 \\ 1 & \text{if } (s,t) \in S\_1^2 \\ \inf\_{\omega \in \Omega(\tau(s), \tau(t))} \sum\_{u,v \in S} \omega(u,v) \, d(u,v) & \text{if } (s,t) \in S\_2^2 \end{cases}$$

Let d ∈ S × S → [0, 1] and ω ∈ D<sup>R</sup>(S × S). Instead of u,v∈<sup>S</sup> <sup>ω</sup>(u, v) <sup>d</sup>(u, v) we write ω · d in the remainder to avoid clutter. Similarly, for f ∈ S → [0, 1] and μ ∈ D<sup>R</sup>(S) we write f · μ instead of <sup>s</sup>∈<sup>S</sup> <sup>f</sup>(s) <sup>μ</sup>(s).

For d, e ∈ S × S → [0, 1], we define d e if for all s, t ∈ S, d(s, t) ≤ e(s, t). According to, for example, [22, Lemma 3.2], S × S → [0, 1], is a complete lattice. Since the function Δ is a monotone function from a complete lattice to itself, we can conclude from the Knaster-Tarski fixed point theorem (see, for example, [13, Theorem 2.35]) that Δ has a least fixed point. We denote this least fixed point by δ. This least fixed point maps each pair of states to a real number in the interval [0, 1]: the probabilistic bisimilarity distance of the states. Distance zero captures probabilistic bisimilarity.

Theorem 6 ([21, Theorem 4.10]). For all s, t ∈ S, δ(s, t)=0 if and only if s ∼ t.

The probabilistic bisimilarity distance function δ is the limit of the distance functions δ<sup>n</sup> which only consider the first n transitions when comparing the similarity of the behaviour of states. This result can be seen as an instance of the Kleene fixed point theorem [34].

Definition 7. For each n ≥ 0, the function δ<sup>n</sup> : S × S → [0, 1] is defined by

$$\delta\_n(s,t) = \begin{cases} 0 & \text{if } n=0\\ \Delta(\delta\_{n-1})(s,t) & \text{otherwise.} \end{cases}$$

Proposition 8. lim<sup>n</sup>→∞ <sup>δ</sup><sup>n</sup> <sup>=</sup> <sup>δ</sup>.

## 3 A Logical Characterization

Below, we present a logical characterization of the probabilistic bisimilarity distances. We start with a logic very similar to the one introduced by Desharnais et al. [20].

Definition 9. *The logic* <sup>L</sup><sup>¬</sup> *is defined by*

$$\varphi ::= a \mid \bigcirc \varphi \mid \neg \varphi \mid \varphi \ominus q \mid \varphi \vee \varphi$$

*where* a <sup>∈</sup> L *and* q <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1]*.*

The above logic is slightly different from the one presented in [20] as we consider Markov chains with labelled states, whereas Desharnais et al. studied Markov chains with labelled transitions. In particular, a and ϕ were combined as aϕ. Since we restrict our attention to finite state systems, we can restrict ourselves to finite disjunctions. In our setting, the constants true and false can be expressed as - <sup>a</sup>∈<sup>L</sup> <sup>a</sup> (recall that we assume that the set <sup>L</sup> is finite as well) and ¬true, respectively. The logic of Desharnais et al. also contains the operator ϕ <sup>q</sup> which is redundant, as observed in [21, page 336]. The logic considered by Desharnais [18] lacks negation, but does include ϕ <sup>q</sup> and conjunction. The real-valued interpretation of the logic of Desharnais et al., which considers labelled transitions, is adjusted to our setting of labelled states as follows.

Definition 10. *The function* -· : <sup>L</sup><sup>¬</sup> <sup>→</sup> <sup>S</sup> <sup>→</sup> [0, 1] *is defined by*

$$\begin{aligned} [a](s) &= \begin{cases} 1 & \text{if } \ell(s) = a \\ 0 & \text{otherwise} \end{cases} \\ [\bigcirc \varphi](s) &= [\varphi] \cdot \tau(s) \\ [\neg \varphi](s) &= 1 - [\varphi](s) \\ [\varphi \ominus q](s) &= \max([\varphi](s) - q, 0) \\ [\varphi \vee \psi](s) &= \max([\varphi](s), [\psi](s)) \end{aligned}$$

Note that false and true are the constant zero and constant one functions, respectively. The probabilistic bisimilarity distances can be characterized in terms of the logic.

Theorem 11 ([5, Theorem 40 and 44]). *For all* <sup>s</sup>*,* <sup>t</sup> <sup>∈</sup> <sup>S</sup>*,*

$$\delta(s, t) = \sup\_{\varphi \in \mathcal{L}\_{\neg}} \|\varphi\|(s) - \|\varphi\|(t).$$

In the remainder of this paper, we consider the following logic. This logic also characterizes probabilistic bisimilarity distances. As we will show later, this logic can explain the probabilistic bisimilarity distances more concisely than the logic presented above.

Definition 12. *The logic* <sup>L</sup> *is defined by*

$$\varphi ::= a \mid \bigcirc \varphi \mid \varphi \ominus q \mid \varphi \oplus q \mid \varphi \vee \varphi \mid \varphi \wedge \varphi \mid$$

*where* a <sup>∈</sup> L *and* q <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1]*.*

Note that negation has been removed and conjunction has been added. Also the operator <sup>⊕</sup>q, which is dual to q, has been added. This logic is very similar to the one considered by Desharnais [18].

292 A. Rady and F. van Breugel

Definition 13. The function -· : L → <sup>S</sup> <sup>→</sup> [0, 1] of Definition <sup>10</sup> is modified by

$$\begin{array}{l} \left[\varphi \oplus q\right](s) = \min\{\left[\varphi\right](s) + q, 1\} \\ \left[\varphi \wedge \psi\right](s) = \min\{\left[\varphi\right](s), \left[\psi\right](s)\} \end{array}$$

As already mentioned above, also this logic characterizes the probabilistic bisimilarity distances.

Theorem 14. For all <sup>s</sup>, <sup>t</sup> <sup>∈</sup> <sup>S</sup>, <sup>δ</sup>(s, t) = sup<sup>ϕ</sup>∈L <sup>ϕ</sup>(s) <sup>−</sup> <sup>ϕ</sup>(t).

Proof sketch. Each formula of L can be rewritten to an equivalent formula of L¬. For example, if ϕ is rewritten to ψ then ϕ ⊕ q is rewritten to ¬(¬ψ q). Each formula of <sup>L</sup> has a dual: if <sup>ϕ</sup>= 1 <sup>−</sup> <sup>ψ</sup> then <sup>ϕ</sup> is a dual of <sup>ψ</sup>. For example, if <sup>ϕ</sup> is a dual of ψ then ϕ q is a dual of ψ ⊕ q. Each formula L<sup>¬</sup> can be rewritten to an equivalent formula of L. For example, if ϕ is rewritten to ψ then ¬ϕ is rewritten to a dual of ψ. The result now follows from Theorem 11.

## 4 All Operators are Necessary

The logic L is a minimal logic that characterizes the probabilistic bisimilarity distances. That is, if we remove any operator from the logic then the resulting logic does not characterizes the probabilistic bisimilarity distances anymore. Due to lack of space, we only consider the logic L\, which does not have the q operator.

Definition 15. The logic L\ is defined by

$$\varphi ::= a \mid \bigcirc \varphi \mid \varphi \oplus q \mid \varphi \vee \varphi \mid \varphi \wedge \varphi \mid$$

where <sup>a</sup> <sup>∈</sup> <sup>L</sup> and <sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1].

Theorem 16. There exists a labelled Markov chain S, L, τ, and s, t ∈ S such that

$$\delta(s, t) > \sup\_{\varphi \in \mathcal{L}\_{\backslash \ominus}} \left[ \varphi \right](s) - \left[ \varphi \right](t).$$

Proof sketch. Consider the following labelled Markov chain.

It can be shown that δ(s, t) = <sup>7</sup> <sup>8</sup> . Furthermore, we can prove that for all ϕ ∈ L\ and <sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1], if <sup>ϕ</sup>(u) <sup>&</sup>lt; <sup>1</sup> <sup>8</sup> <sup>−</sup> <sup>q</sup> <sup>2</sup> then <sup>ϕ</sup>(v) <sup>&</sup>lt; <sup>3</sup> <sup>4</sup> − q by structural induction on ϕ. Using this result and Theorem 14, we can also show that for all ϕ ∈ L\, <sup>ϕ</sup>(s) <sup>−</sup> <sup>ϕ</sup>(t) <sup>≤</sup> <sup>27</sup> <sup>32</sup> by structural induction on ϕ.

## 5 Explainability

In general, the probabilistic bisimilarity distance of two states cannot be explained by a single formula, as we will show next. That is, generally there does not exist a distinguishing formula for every pair of states of a labelled Markov chain. But, as we will prove below, for every pair of states there exists a sequence of formulas that explains their distance.

Theorem 17. *There exists a labelled Markov chain* -S, L, τ, *and* <sup>s</sup>*,* <sup>t</sup> <sup>∈</sup> <sup>S</sup> *such that for all* <sup>ϕ</sup> ∈ L*,* <sup>δ</sup>(s, t) <sup>&</sup>gt; <sup>ϕ</sup>(s) <sup>−</sup> <sup>ϕ</sup>(t)*.*

*Proof sketch.* Consider the following labelled Markov chain.

It can be shown that <sup>δ</sup>(s, t)=1. We can also prove that for all <sup>ϕ</sup> ∈ L, <sup>ϕ</sup>(s) <sup>−</sup> <sup>ϕ</sup>(t) <sup>&</sup>lt; <sup>1</sup> by structural induction on <sup>ϕ</sup>.

As we will show next, for every pair of states (s, t) there exists a sequence of formulas (ξn)n such that <sup>δ</sup>(s, t) = limn→∞ ξn(s)−ξn(t). This sequence (ξn)n explains the distance δ(s, t).

Proposition 18. *For all* <sup>s</sup>*,* <sup>t</sup> <sup>∈</sup> <sup>S</sup> *there exists* (ξn)n *such that*

$$\delta(s, t) = \lim\_{n \to \infty} \left[ \xi\_n \right](s) - \left[ \xi\_n \right](t).$$

*Proof sketch.* This can be concluded from Theorem 14 and the following. Let X be a nonempty subset of R that is bounded above. Then there exists a sequence (xn)n in <sup>X</sup> that converges to sup <sup>X</sup> [8, page 4].

The proof of the above proposition is *not* constructive. Below, we will construct a sequence of formulas (ϕ<sup>n</sup> st)<sup>n</sup> that explains the distance of the states <sup>s</sup> and t. In particular, ϕ<sup>n</sup> st is constructed so that

$$\|\varphi\_{st}^{n}\|(s) = \delta\_{n}(s,t) \text{ and } \|\varphi\_{st}^{n}\|(t) = 0.$$

and, hence, ϕn st(s) <sup>−</sup> ϕn st(t) = <sup>δ</sup>n(s, t). That is, the formula <sup>ϕ</sup><sup>n</sup> st explains the distance <sup>δ</sup>n(s, t).

If <sup>n</sup> = 0 then <sup>δ</sup>n(s, t)=0. We choose the formula false since

$$\text{[false]}(s) = 0 = \delta\_0(s, t) \text{ and } \text{[false]}(t) = 0.1$$

Let n > <sup>0</sup>. For (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> <sup>0</sup> , also <sup>δ</sup>n(s, t)=0. Again we choose the formula false to explain the distance. For (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> <sup>1</sup> , we have that <sup>δ</sup>n(s, t)=1. In this case the formula (s) explains <sup>δ</sup>n(s, t) since (s) <sup>=</sup> (t) and, therefore,

$$\left[\mathbb{I}(s)\right](s) = 1 = \delta\_n(s, t) \text{ and } \left[\mathbb{I}(s)\right](t) = 0.$$

#### 294 A. Rady and F. van Breugel

To construct a formula that explains distance <sup>δ</sup>n(s, t) for (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? , we rely on the following result about distances and nonexpansive functions. A function f ∈ S → [0, 1] is *nonexpansive* if for all s, t ∈ S, |f(s) − f(t)| ≤ δn(s, t). The set of nonexpansive functions is denoted by (S, δn) ------- [0, 1]. This set forms a convex polytope and is known as the *Lipschitz polytope*. We denote its *vertices* by V ((S, δn) -------[0, 1]).

Proposition 19. *For all* (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? *and* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*, there exists* <sup>f</sup> <sup>n</sup> st ∈ (S, δn) *------*- (<sup>Q</sup> <sup>∩</sup> [0, 1]) *such that* <sup>δ</sup>n+1(s, t) = <sup>f</sup> <sup>n</sup> st · (τ (s) − τ (t))*.*

*Proof sketch.* Let (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? and n ≥ 0. Then

$$\delta\_{n+1}(s,t) = \inf\_{\omega \in \Omega\_{\mathbb{R}}(\tau(s), \tau(t))} \omega \cdot \delta\_n \dots$$

We can view δn+1(s, t) as the minimal cost of a transportation problem, where τ (s)(u) represents the amount transported from the origin u, τ (t)(v) captures the amount received at the destination v, δn(u, v) represents the transportation cost from u to v, and each ω captures a transportation plan, that is, ω(u, v) is the amount transported from u to v (see, for example, [40, page 15]).

From the Kantorovich-Rubinstein duality theorem [31] we can conclude that

$$\inf\_{\omega \in \Omega\_{\mathbb{R}}(\tau(s), \tau(t))} \omega \cdot \delta\_n = \sup\_{f \in (S, \delta\_n) \xrightarrow{} [0, 1]} f \cdot (\tau(s) - \tau(t)).$$

In this dual to the above transportation problem, each f represents a price function (see, for example, [40, page 81]). Since a linear function on a convex polytope attains its maximum at a vertex (see, for example, [49, Theorem 2 of Chapter 1]), we can conclude that

$$\sup\_{f \in (S, \delta\_n) \twoheadrightarrow [0, 1]} f \cdot (\tau(s) - \tau(t)) = \max\_{f \in V((S, \delta\_n) \twoheadrightarrow [0, 1])} f \cdot (\tau(s) - \tau(t)).$$

Since we can prove that V ((S, δn)------- [0, 1]) ⊆ (S, δn)------- (<sup>Q</sup> <sup>∩</sup> [0, 1]), there exists f n st ∈ (S, δn) ------- (<sup>Q</sup> <sup>∩</sup> [0, 1]) such that <sup>δ</sup>n+1(s, t) = <sup>f</sup> <sup>n</sup> st · (τ (s) − τ (t)). 

The function f <sup>n</sup> st plays a key role in the formula explaining δn(s, t). However, f n st is not necessarily unique. Consider the following labelled Markov chain.

For this example, the sequence (δn)<sup>n</sup> converges in three steps, that is, δ = δ3. We have that δ2(u, v) = <sup>1</sup> <sup>2</sup> and <sup>δ</sup>3(s, t) = <sup>1</sup> <sup>2</sup> . So we need the function f <sup>2</sup> st to

satisfy δ3(s, t) = f <sup>2</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>2</sup> st(v) and <sup>|</sup><sup>f</sup> <sup>2</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>2</sup> st(v)| ≤ <sup>1</sup> <sup>2</sup> . For each <sup>0</sup> <sup>≤</sup> <sup>q</sup> <sup>≤</sup> <sup>1</sup> 2 , f 2 st(u) = <sup>1</sup> <sup>2</sup> + q and f <sup>2</sup> st(v) = q satisfies these properties. As we will see, any f <sup>n</sup> st that satisfies these properties can be used to construct ϕ<sup>n</sup> st. How to compute these functions f <sup>n</sup> st is the topic of the next section.

As we will show in Theorem 22, we can construct a formula ψ<sup>n</sup> st that captures the function f <sup>n</sup> st, that is, ψn st <sup>=</sup> <sup>f</sup> <sup>n</sup> st. More about this soon. By means of ψ<sup>n</sup>−<sup>1</sup> st we can explain the distance <sup>δ</sup>n(s, t) by the formula (ψ<sup>n</sup>−<sup>1</sup> st )(<sup>f</sup> <sup>n</sup>−<sup>1</sup> st · <sup>τ</sup> (t)) since we have that

$$\begin{aligned} \left[ \left( \bigcirc \psi\_{st}^{n-1} \right) \odot \left( f\_{st}^{n-1} \cdot \tau(t) \right) \right](s) &= \max \{ \left( \bigcirc\_{st}^{n-1} \right) \cdot \tau(s) \big) - \left( f\_{st}^{n-1} \cdot \tau(t) \right), 0 \} \\ &= \max \{ \left( f\_{st}^{n-1} \cdot \tau(s) \right) - \left( f\_{st}^{n-1} \cdot \tau(t) \right), 0 \} \\ &= \max \{ f\_{st}^{n-1} \cdot \left( \tau(s) - \tau(t) \right), 0 \} \\ &= \max \{ \delta\_n(s, t), 0 \} \\ &= \delta\_n(s, t) \end{aligned}$$

and, similarly, we can deduce that

$$\left\| \left( \bigcirc \psi\_{st}^{n-1} \right) \odot \left( f\_{st}^{n-1} \cdot \tau(t) \right) \right\|(t) = \max \{ f\_{st}^{n-1} \cdot (\tau(t) - \tau(t)), 0 \} = 0.$$

Let us return to the formula ψ<sup>n</sup> st that captures the function f <sup>n</sup> st. To construct ψn st we use the following result.

Lemma 20 ([2, Lemma A7.2]). *Let* f ∈ S → [0, 1]*. If for all* u*,* v ∈ S*, there exists* guv ∈ S → [0, 1] *such that* guv(u) = f(u) *and* guv(v) = f(v)*, then*

$$f = \min\_{u \in S} \max\_{v \in S} g\_{uv} = \max\_{u \in S} \min\_{v \in S} g\_{uv}.$$

To apply the above lemma, we need to construct for all u, v ∈ S a formula ψn stuv such that

$$\|\psi\_{stuv}^n\|(u) = f\_{st}^n(u) \text{ and } \|\psi\_{stuv}^n\|(v) = f\_{st}^n(v).$$

The details are provided in Definition 21 and Theorem 22. From Lemma 20 we can then conclude that

$$\left[\bigwedge\_{u \in S} \bigvee\_{v \in S} \psi\_{stuv}^n \right] = \left[\bigvee\_{u \in S} \bigwedge\_{v \in S} \psi\_{stuv}^n \right] = f\_{st}^n.$$

The above can be summarized as follows.

Definition 21. *For all* s*,* t ∈ S*,*

$$
\varphi\_{st}^{0} = \text{false}
$$

*and*

$$
\varphi\_{st}^1 = \begin{cases}
\text{false} \; if \; (s,t) \in S\_0^2 \cup S\_?^2 \\
\ell(s) \; if \; (s,t) \in S\_1^2
\end{cases}
$$

296 A. Rady and F. van Breugel

*For all* <sup>s</sup>*,* <sup>t</sup> <sup>∈</sup> <sup>S</sup> *and* <sup>n</sup> <sup>≥</sup> <sup>2</sup>*,*

$$
\varphi\_{st}^{n} = \begin{cases}
\text{false} & \text{if } (s,t) \in S\_0^2 \\
\ell(s) & \text{if } (s,t) \in S\_1^2 \\
\left(\bigcirc \psi\_{st}^{n-1}\right) \ominus \left(f\_{st}^{n-1} \cdot \tau(t)\right) \operatorname{if}\left(s,t\right) \in S\_2^2
\end{cases}
$$

*For all* (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? *and* <sup>n</sup> <sup>≥</sup> <sup>1</sup>*,*

$$
\psi\_{st}^n = \bigwedge\_{u \in S} \bigvee\_{v \in S} \psi\_{stuv}^n
$$

*For all* (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? *,* <sup>u</sup>*,* <sup>v</sup> <sup>∈</sup> <sup>S</sup>*, and* <sup>n</sup> <sup>≥</sup> <sup>1</sup>*,*

$$
\psi\_{stuv}^{n} = \begin{cases}
\text{false} \ominus f\_{st}^{n}(u) & \text{if } f\_{st}^{n}(u) = f\_{st}^{n}(v) \\
\left(\varphi\_{uv}^{n} \ominus \left(\delta\_{n}(u,v) - \left(f\_{st}^{n}(u) - f\_{st}^{n}(v)\right)\right) \ominus f\_{st}^{n}(v)\right) \text{if } f\_{st}^{n}(u) > f\_{st}^{n}(v) \\
\left(\varphi\_{vu}^{n} \ominus \left(\delta\_{n}(u,v) - \left(f\_{st}^{n}(v) - f\_{st}^{n}(u)\right)\right) \ominus f\_{st}^{n}(u)\right) \text{otherwise}.
\end{cases}
$$

Note that, for (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? and <sup>n</sup> <sup>≥</sup> <sup>2</sup>, the formula <sup>ϕ</sup><sup>n</sup> st contains |S| <sup>2</sup> subformulas of the form ϕ<sup>n</sup>−<sup>1</sup> uv . As a consequence, the size of ϕ<sup>n</sup> st grows exponentially in n. As we will see in Section 7, we can compute ϕ<sup>n</sup> st in polynomial time by sharing subformulas.

The above definition shows some similarities with the sequence of formulas introduced in [43, Definition 8]. Their setting is different: the transitions are labelled (as in [20]), the transition function is deterministic, and the labelling of the transitions is probabilistic. Their logic is simpler than the one introduced in [20] since the systems they consider are simpler. The sequence of formulas that they introduce is syntactically simpler than the one we define above. Their formulas are only used to prove a logical characterization, although those formulas can also be used for explainability.

Consider the states s and t of the following labelled Markov chain.

By definition, ϕ<sup>0</sup> st = false and ϕ<sup>1</sup> st = false. For ϕ<sup>2</sup> st we get

$$\left( \left( \bigvee \left( \left( \text{false} \oplus 0 \right) \vee \left( \text{false} \oplus 0 \right) \vee \dots \vee \left( \text{false} \oplus 0 \right) \right) \right) \land \right)$$

$$\left( \left( \text{false} \oplus 0 \right) \vee \left( \text{false} \oplus 0 \right) \vee \dots \vee \left( \text{false} \oplus 0 \right) \right) \land \right\ \begin{cases} \text{false} \oplus \left( \text{false} \oplus 0 \right) \vee \dots \vee \left( \text{false} \oplus 0 \right) \lor \left( \text{false} \oplus 0 \right) \\ \vdots \\ \left( \left( \text{false} \oplus 0 \right) \vee \left( \text{false} \oplus 0 \right) \vee \dots \vee \left( \text{false} \oplus 0 \right) \right) \mid \end{cases} \right) \text{ten} \text{ times}$$

This formula can be simplified to false. In the logic of Desharnais et al., which lacks ∧ and ⊕, one would need 111 additional ¬, making it less concise.

The formula ϕ<sup>3</sup> st fills more than a page, but can be simplified to the formula (( ∧ )) 0.375. Although generally there does not exist a distinguishing formula for each pair of states (Theorem 17), in this case the formula ϕ<sup>3</sup> st explains the distance of states s and t, since δ(s, t)=0.125, - ϕ3 st (s)=0.125 and - ϕ3 st (t)=0. The formula captures the probability of reaching a green state in one transition and subsequently reaching another green state.

The formula ϕ<sup>3</sup> ts can be simplified to (( ∧ )) 0.5. Since we have that - ϕ3 ts (t)=0.125 and - ϕ3 ts (s)=0, the formula ϕ<sup>3</sup> ts explains the distance δ(t, s)=0.125. The formula represents the probability of reaching a green state in one transition and subsequently reaching a purple state.

The outermost test can be removed from the explanation. Hence, the formulas ( ∧ ) and ( ∧ ) explain the distance of states s and t as well.

#### Theorem 22.


*Proof sketch.* This theorem can be proved by induction on n. Most steps of the proof have already been discussed above. To prove (c), let (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? , u, v ∈ S and n ≥ 1. We need to distinguish three cases. Here we only consider the case that f <sup>n</sup> st(u) > f <sup>n</sup> st(v). Then

ψn stuv(u) = (ϕ<sup>n</sup> uv (δn(u, v) <sup>−</sup> (<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v)))) <sup>⊕</sup> <sup>f</sup> <sup>n</sup> st(v)(u) = min{max{ϕ<sup>n</sup> uv(u) <sup>−</sup> (δn(u, v) <sup>−</sup> (<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v))), <sup>0</sup>} <sup>+</sup> <sup>f</sup> <sup>n</sup> st(v), 1} = min{max{δn(u, v) <sup>−</sup> (δn(u, v) <sup>−</sup> (<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v))), <sup>0</sup>} <sup>+</sup> <sup>f</sup> <sup>n</sup> st(v), 1} [induction hypothesis of (a)] = min{max{<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v)), <sup>0</sup>} <sup>+</sup> <sup>f</sup> <sup>n</sup> st(v), 1} = min{(<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v)) + f <sup>n</sup> st(v), <sup>1</sup>} [<sup>f</sup> <sup>n</sup> st(u) > f <sup>n</sup> st(v)] = min{<sup>f</sup> <sup>n</sup> st(u), 1} = f <sup>n</sup> st(u) ψn stuv(v) = (ϕ<sup>n</sup> uv (δn(u, v) <sup>−</sup> (<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v)))) <sup>⊕</sup> <sup>f</sup> <sup>n</sup> st(v)(v) = min{max{ϕ<sup>n</sup> uv(v) <sup>−</sup> (δn(u, v) <sup>−</sup> (<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v))), <sup>0</sup>} <sup>+</sup> <sup>f</sup> <sup>n</sup> st(v), 1} = min{max{<sup>0</sup> <sup>−</sup> (δn(u, v) <sup>−</sup> (<sup>f</sup> <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v))), <sup>0</sup>} <sup>+</sup> <sup>f</sup> <sup>n</sup> st(v), 1} [induction hypothesis of (a)] = min{0 + <sup>f</sup> <sup>n</sup> st(v), 1} [f <sup>n</sup> st(u) <sup>−</sup> <sup>f</sup> <sup>n</sup> st(v) <sup>≤</sup> <sup>δ</sup>n(u, v) since <sup>f</sup> <sup>n</sup> st is nonexpansive] = f <sup>n</sup> st(v)

Combining Proposition 8 and Theorem 22, we obtain the following explainability result.

Corollary 23. *For all* <sup>s</sup>*,* <sup>t</sup> <sup>∈</sup> <sup>S</sup>*,* lim<sup>n</sup>→∞ ϕ<sup>n</sup> st(s) <sup>−</sup> ϕ<sup>n</sup> st(t) = <sup>δ</sup>(s, t)*.*

#### 6 Computing *f <sup>n</sup> st*

Proposition 19 states that the functions f <sup>n</sup> st exist. Below, we will show that these functions can be computed in polynomial time.

Let (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? . The function f <sup>0</sup> st ∈ S ------- (<sup>Q</sup> <sup>∩</sup> [0, 1]) is defined as the constant zero function satisfies δ1(s, t) = f <sup>0</sup> st · (τ (s) − τ (t)) and can be computed in polynomial time. To prove that the remaining functions f <sup>n</sup> st, with n ≥ 1, can be computed in polynomial time as well, we use the primal network simplex algorithm to solve minimum-cost flow problems due to Orlin [42] and the ellipsoid method to solve linear programming problems due to Khachiyan [32]. As we will show below, f <sup>n</sup> st can be computed as FindVertex(δn, τ (s), τ (t)).

<sup>1</sup> FindVertex(d, μ, ν)

$$\mu\_2 \qquad input : d \in \dot{S} \times S \stackrel{\circ}{\to} (\mathbb{Q} \cap [0, 1]) \text{ with } d(s, s) = 0 \text{ for all } s \in S, \quad \mu, \nu \in \mathcal{D}\_{\mathbb{Q}}(S)$$


$$f\_{\mathbb{R}} \quad f\_{\mu\nu} = \text{ vertex} \quad \text{of} \quad \{ f \in (S, d) \twoheadrightarrow [0, 1] \mid f \cdot (\mu - \nu) = d\_{\mu\nu} \} $$

<sup>6</sup> return fμν

In line 4 we use Orlin's primal network simplex algorithm to compute the minimum cost for the following network (N,E). The nodes of the network consist of two copies of each u ∈ S, denoted u<sup>0</sup> and u1. The supply of node u<sup>0</sup> is μ(u) and the demand of node u<sup>1</sup> is ν(u). Each edge (u0, v1) has cost d(u, v).

Each ω ∈ ΩR(μ, ν) corresponds to a feasible flow, where ω(u, v) captures the flow from u<sup>0</sup> to v1. The constraints ω(u, S) = μ(u) and ω(S, u) = ν(u), defining ΩR(μ, ν), capture that the supply of u<sup>0</sup> flows from u<sup>0</sup> and the demand of u<sup>1</sup> flows to u1. For a feasible flow ω, its cost is ω · d. Hence, dμν captures the minimum cost.

Note that, by definition, the supplies and demands are rational. We can prove that dμν = ω·d for some ω ∈ ΩQ(μ, ν). Since d is rational as well, we can conclude that dμν is also rational. Orlin's primal network simplex algorithm can compute the minimum cost and, hence, can be used to compute dμν. Orlin's algorithm is strongly polynomial: <sup>O</sup>(|N<sup>|</sup> <sup>2</sup>|E<sup>|</sup> <sup>2</sup> log <sup>|</sup>N|). Since there are <sup>2</sup>|S<sup>|</sup> nodes and <sup>|</sup>S<sup>|</sup> 2 edges, <sup>d</sup>μν can be computed in <sup>O</sup>(|S<sup>|</sup> <sup>6</sup> log <sup>|</sup>S|).

In line 5 we use Khachiyan's ellipsoid method to find a feasible solution of a linear programming problem with the variables <sup>x</sup>s, for <sup>s</sup> <sup>∈</sup> <sup>S</sup>, and the constraints

$$\begin{aligned} \forall s, t \in S: x\_s - x\_t &\le d(s, t) \\ \forall s \in S: x\_s &\ge 0 \\ \forall s \in S: x\_s &\le 1 \\ \sum\_{s \in S} x\_s \left( \mu(s) - \nu(s) \right) &= d\_{\mu \nu} \end{aligned}$$

By means of the ellipsoid method we can find a vertex of the convex polytope defined by the above constraints. This method is polynomial in the size of the constraints, in this case, the size of d, μ, ν, and dμν.

Let <sup>n</sup> <sup>≥</sup> <sup>1</sup> and (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? . Since we can show that <sup>δ</sup><sup>n</sup> is rational and <sup>δ</sup>n(s, s)=0 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, we can apply FindVertex to <sup>δ</sup>n, <sup>τ</sup> (s) and <sup>τ</sup> (t). In this case, line 4 computes inf<sup>ω</sup>∈ΩR(τ(s),τ(t)) <sup>ω</sup> · <sup>δ</sup>n, which equals <sup>δ</sup>n+1(s, t). As a consequence, FindVertex(δn, τ (s), τ (t)) returns f <sup>n</sup> st : (S, δn)-------(Q∩[0, 1]) such that f <sup>n</sup> st · (<sup>τ</sup> (s) <sup>−</sup> <sup>τ</sup> (t)) = <sup>δ</sup>n+1(s, t).

As we already observed above, line 4 can be computed in polynomial time in the size of the labelled Markov chain and line 5 can be computed in polynomial time in the size of δn, τ (s), τ (t), and δn+1(s, t), which we can show to be polynomial time in the size of the labelled Markov chain and n. Hence, the running time of FindVertex(δn, τ (s), τ (t)) is polynomial in the size of the labelled Markov chain and n.

## 7 The Algorithm

Given a labelled Markov chain S, L, τ, and <sup>N</sup> <sup>∈</sup> <sup>N</sup>, we can explain the distances <sup>δ</sup>(s, t) for <sup>s</sup>, <sup>t</sup> <sup>∈</sup> <sup>S</sup> by computing the formulas <sup>ϕ</sup><sup>n</sup> st for <sup>0</sup> <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>N</sup>. To obtain this sequence of formulas, we implement Definition 21 as follows. Below, for s, t, <sup>u</sup> <sup>∈</sup> <sup>S</sup>, we use the array cells distance[s][t], function[s][t][u], and formula[s][t][n] to represent the distance <sup>δ</sup><sup>n</sup>−<sup>1</sup>(s, t), the function value <sup>f</sup> <sup>n</sup>−<sup>1</sup> st (u), and the formula ϕ<sup>n</sup> st, respectively. In line 5-17, we compute δ0, f <sup>0</sup> st, ϕ<sup>0</sup> st, and ϕ<sup>1</sup> st. The loop of line 20–50, first computes the distances δ<sup>n</sup> (line 21–27), then determines the function f <sup>n</sup> st (line 30), and finally computes formulas ϕn+1 st (line 31–49).

$$\mathbb{P}\_2 \qquad input : \tau \in S \underset{\square}{\to} \mathcal{D}\_{\mathbb{Q}}(S), \ell \in S \to L, N \ge 1$$

<sup>3</sup> output : (ϕ<sup>n</sup> st) N <sup>n</sup>=0 for all s, t ∈ S

<sup>1</sup> ExplainDistances(τ, , N):

```
4 ∼ = DecideProbabilisticBisimilarity(τ, ) for s ∈ S and t ∈ S
5 formula[s][t][0] = false
6 distance[s][t]=0
7 if s ∼ t
8 for 1 ≤ n ≤ N
9 formula[s][t][N] = false
10 else if (s) = (t)
11 for 1 ≤ n ≤ N
12 formula[s][t][N] = (s)
13 else
14 formula[s][t][1] = false
15 for u ∈ S
16 function[s][t][u]=0
17
18 n = 1
19 while n<N
20 for s ∈ S and t ∈ S
21 if (s) = (t)
22 distance[s][t]=1
23 if s ∼ t ∧ (s) = (t)
24 distance[s][t]=0
25 for u ∈ S
26 distance[s][t] += function[s][t][u] ∗ (τ (s)(u) − τ (t)(u))
27 for s ∈ S and t ∈ S
28 if s ∼ t ∧ (s) = (t)
29 function[s][t] = FindVertex(distance, τ (s), τ (t))
30 disjunction = false
31 for u ∈ S
32 conjunction = true
33 for v ∈ S
34 if function[s][t][u] = function[s][t][v]
35 subformula = false ⊕ function[s][t][u]
36 else
37 minusShift = distance[u][v] − |function[s][t][u] − function[s][t][v]|
38 plusShift = min {function[s][t][u], function[s][t][v]}
39 if function[s][t][u] > function[s][t][v]
40 subformula = (formula[u][v][n] 	 minusShift) ⊕ plusShift
41 else
42 subformula = (formula[v][u][n] 	 minusShift) ⊕ plusShift
43 disjunction ∨= subformula
44 conjunction ∧= disjunction
45 shift = 0;
46 for u ∈ S
47 shift += function[s][t][u] ∗ τ (t)(u)
48 formula[s][t][n + 1]=( disjunction) 	 shift
49 n = n + 1
```
Let us first discuss the correctness of the above algorithm. In line 4, ∼ is computed by deciding probabilistic bisimilarity. The loop spanning line 20–50 has the following invariant.

$$\forall s, t \in S: \text{distance}[s][t] = \delta\_{n-1}(s, t) \tag{1}$$

$$\forall (s, t) \in S\_?^2: \forall u \in S: \text{function}[s][t][u] = f\_{st}^{n-1}(u) \tag{2}$$

$$\forall s, t \in S: \forall 0 \le i \le n: \text{formula}[s][t][i] = \varphi\_{st}^i \tag{3}$$

Let us check that the above loop invariant holds when we reach line 21 for the first time. In line 7 we set distance to zero. Hence, (1) is satisfied when we reach line 21. In line 17 we set function to zero. Hence, (2) is also satisfied when we reach line 21. In line 6, 10, 13, and 15 we set formula such that (3) is satisfied when we reach line 21.

Next, we check that the loop maintains the above invariant, that is, if the invariant holds at line 21 then it also holds at line 50. Assume that the invariant holds at line 21. From (2) and line 22–27 we can conclude that

$$\text{distance}[s][t] = \begin{cases} 0 & \text{if } (s,t) \in S\_0^2\\ 1 & \text{if } (s,t) \in S\_1^2\\ f\_{st}^{n-1} \cdot (\tau(s) - \tau(t)) \text{ otherwise} \end{cases}$$

once we arrive at line 28. Hence, from Proposition 19 we can conclude that distance[s][t] = <sup>δ</sup>n(s, t) for all <sup>s</sup>, <sup>t</sup> <sup>∈</sup> <sup>S</sup>. Therefore, (1) holds at line 50.

Since distance = δ<sup>n</sup> at line 30 and, as we have seen in Section 6, FindVertex(δn, τ (s), τ (t)) returns f <sup>n</sup> st, we assign f <sup>n</sup> st to function[s][t] in line 30. Hence, (2) holds at line 50. We can also verify that line 31–49 ensure that (3) is maintained by the loop.

Finally, we will argue that the running time of the above algorithm is polynomial in the size of the labelled Markov chain and N. Probabilistic bisimilarity can be decided in polynomial time as was first shown by Baier [3]. More efficient algorithms have been proposed by Buchholz [7], Derisavi, Hermanns, and Sanders [17] and Valmari and Franceschinis [50]. Hence, line 4 is polynomial time.

Each line of 6–17 can be implemented in constant time. Since each line of this part is executed at most <sup>N</sup>|S<sup>|</sup> <sup>3</sup> times, the running time of line 5–17 is polynomial in the size of the labelled Markov chain and N.

The loop consisting of line 20–50 is executed <sup>N</sup> <sup>−</sup> <sup>1</sup> times. As we already discussed in Section 6, the running time of FindVertex(δn, τ (s), τ (t)) is polynomial in the size of the labelled Markov chain and n. When we arrive at line 30, distance equals δ<sup>n</sup> and, hence, this line is polynomial in the size of the labelled Markov chain and n. All other lines of the loop can be implemented in constant time. Each line is executed at most <sup>|</sup>S<sup>|</sup> <sup>4</sup> times. Therefore, the running time of line 20–50 is polynomial in the size of the labelled Markov chain and N.

## 8 Conclusion

In this paper, we study a minor variation of the logic introduced by Desharnais et al. in [20]. In particular, we show that


As pointed out by Hillerström in [30], an early paper on computing distinguishing formulas, to explain why states are not bisimilar "arguments must be concise in the sense that an argument must not contain redundant or irrelevant information." This applies to our setting as well. The distinguishing formulas introduced in Definition 21 are in many cases far from concise. We leave the simplification of these formulas for future research.

One may wonder whether adding fixed points to the logic, in the form variables X and either operators μX and νX or equations of the form X = ϕ, would allow us to explain the probabilistic bisimilarity distance of two states by means of a single formula. A logic similar to the one studied in this paper that contains fixed points has been studied by De Alfaro et al. [1]. Whether simply adding fixed points to the logic suffices is not immediately clear as the p<sup>n</sup> and ⊕q<sup>n</sup> in the formula ψ<sup>n</sup> uvst vary as n varies. Extending the logic so that the probabilistic bisimilarity distance of two states can be explained by means of a single formula is another potential topic for future research.

Graf and Sifakis [28] introduce the notion of a characteristic formula for a state s: a state satisfies this formula if and only if it is behaviourally equivalent to s. Characteristic formulas have been developed for probabilistic bisimilarity. For example, Deng and van Glabbeek [16] present characteristic formulas for probabilistic automata. Sack and Zhang [47] introduce a general framework to construct characteristic formulas for probabilistic automata. In the setting of probabilistic bisimilarity distances, a characteristic formula for a state s of a labelled Markov chain can be formalized in the following ways. The formula ϕ<sup>s</sup> is a characteristic formula for the state s if

$$\text{for all states } t, \ \lceil \varphi\_s \rceil (s) - \lceil \varphi\_s \rceil (t) = \delta (s, t) \tag{4}$$

or

$$\text{for all states } t, \ \lceil \varphi\_s \rceil (t) = \delta(s, t). \tag{5}$$

It can be shown that (4) and (5) are equivalent: if there exists a formula that satisfies (4) then there also exists a (different) formula that satisfies (5). Whether such a formula or a sequence of such formulas exists for the logic studied in this paper is an open question that may be tackled in future research.

A preliminary implementation of the algorithm in Java is available [45]. Improving the code is another avenue for further research.

Acknowledgements The authors thank the referees for their very detailed and constructive feedback. The second author thanks the Department of Computer Science of the University of Oxford for hosting him for his sabbatical during which part of this research was carried out.

## References


and Viktor Kuncak, editors, *Proceedings of the 29th International Conference on Computer Aided Verification*, volume 10427 of *Lecture Notes in Computer Science*,


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Weighted and Branching Bisimilarities from Generalized Open Maps**

J´er´emy Dubut1() and Thorsten Wißmann<sup>2</sup>,<sup>3</sup> ‹

<sup>1</sup> National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

jeremy.dubut@aist.go.jp

<sup>2</sup> Radboud University, Nijmegen, the Netherlands

t.wissmann@cs.ru.nl

<sup>3</sup> Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany

**Abstract.** In the open map approach to bisimilarity, the paths and their runs in a given state-based system are the first-class citizens, and bisimilarity becomes a derived notion. While open maps were successfully used to model bisimilarity in non-deterministic systems, the approach fails to describe quantitative system equivalences such as probabilistic bisimilarity. In the present work, we see that this is indeed impossible and we thus generalize the notion of open maps to also accommodate weighted and probabilistic bisimilarity. Also, extending the notions of strong path and path bisimulations into this new framework, we show that branching bisimilarity can be captured by this extended theory and that it can be viewed as the history preserving restriction of weak bisimilarity.

**Keywords:** Open maps · Weighted Bisimilarity · Probabilistic Bisimilarity · Branching Bisimilarity · Weak Bisimilarity

## **1 Introduction**

The theory of open maps is a categorical framework to reason about systems and their bisimilarities [16]. Given a category of systems and a description of the shape of the executions and how to extend them, open maps are morphisms with lifting properties with respect to those extensions. Intuitively, open maps are morphisms which preserve and reflect transitions of systems, that is, they are morphisms whose graphs are bisimulations. The theory covers various classical notions of bisimilarity. For example, two LTSs are strongly bisimilar if and only if there is a span of open maps between them. Varying the category of models and the execution shapes allows describing weak bisimilarity, timed bisimilarity, probabilistic Larsen and Skou bisimilarity, and history-preserving bisimilarity of event structures (see [16,3,12] for examples).

Another categorical framework for bisimilarity is coalgebra [22]. This time, given a category and an endofunctor describing respectively the type of state spaces and the type of transitions, a 'system' is understood as a coalgebra for this

<sup>‹</sup> Supported by the NWO TOP project 612.001.852.

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. 308–327, 2023. https://doi.org/10.1007/978-3-031-30829-1 15

functor. Coalgebra homomorphisms are then very similar to open maps in spirit: they also are morphisms that preserve and reflect transitions. This intuition has been made formal by transformations between the categorical frameworks in both ways; from open maps to coalgebra [19], and conversely [25]. However, the latter suggests that open maps are only adapted to modeling non-deterministic systems and would struggle with other types of branchings, such as probabilistic.

In coalgebra, there are no particular difficulties in modeling weighted systems, and by extension, discrete probabilistic systems [17]. There is also some work for continuous probabilities, although the theory is much more complicated [5,4]. As we will explain more precisely later, there have been some attempts to do so with open maps in [3,5], but the result is somewhat disappointing.

Conversely, coalgebra is not adapted to bisimilarities for systems where transitions are not history-preserving, that is, for which the behavioral equivalence does not just depend on the transitions at a given state, but on the whole history of the execution that led to this state. That is the case for example for branching bisimilarity [23]. Branching bisimilarity arose precisely to make weak bisimilarity history-preserving. In [3], weak bisimilarity has been described using open maps by carefully choosing the underlying category, with a general theory developed in [9] using presheaf models. Branching bisimilarity has also been studied using open maps in [1,2], but indirectly, through a translation into presheaves.

To resume, the goal of this paper is to capture weighted and branching bisimilarities using a generalization of open maps. Concretely, the contributions are:


Full proofs can be found in the appendix: http://arxiv.org/abs/2301.07004

## **2 From Path Categories to Bisimilarity**

Before discussing weighted bisimilarity, let us first recall the main ideas of modeling bisimilarity via open maps, as introduced by Joyal et al. [16]. The definition is parametric in a functor <sup>J</sup> : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>, from a category <sup>P</sup> of paths to a category M of models or systems of interest. In the prime example, M is the category of labelled transition systems LTS as defined next:

**Definition 2.1.** *For a fixed set* A *of labels, the category* LTS *contains:*

*1. Objects: a labelled transition system* pX, , x0q *is a set* X *of states, a transition relation* Ď X ˆ A ˆ X *and a distinguished initial state* x<sup>0</sup> P X*. We* write x <sup>a</sup> x<sup>1</sup> to denote that <sup>p</sup>x, a, x<sup>1</sup> q P and simply refer to the LTS as X if and <sup>x</sup><sup>0</sup> are clear from the context. For disambiguation, we use <sup>Ñ</sup> for morphisms and for transitions.

2. Morphisms: a functional simulation <sup>f</sup> : <sup>p</sup>X, , x<sup>0</sup>qÑpY, , y<sup>0</sup><sup>q</sup> is a function <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> with <sup>f</sup>px<sup>0</sup>q " <sup>y</sup><sup>0</sup> and for all <sup>x</sup> <sup>a</sup> <sup>x</sup><sup>1</sup> in <sup>X</sup>, we have fpx<sup>q</sup> <sup>a</sup> fpx<sup>1</sup> q.

A functional simulation f : X <sup>Ñ</sup> Y intuitively means that the system Y has at least the transitions of X, but possibly more. A special case of a functional simulation is the run of a word in a system:

**Definition 2.2.** For the label set <sup>A</sup>, let <sup>p</sup>A˚, ďq be the partially ordered set of words, ordered by the prefix ordering. The functor J : <sup>p</sup>A˚, ďq Ñ LTS sends a word w <sup>P</sup> A˚ to the LTS Jw " pt<sup>v</sup> <sup>|</sup> <sup>v</sup> <sup>ď</sup> <sup>w</sup>u, , ε<sup>q</sup> of all prefixes of <sup>w</sup> with v <sup>a</sup> va for all <sup>a</sup> <sup>P</sup> <sup>A</sup>, va <sup>ď</sup> <sup>w</sup>.

This functor J (or more precisely, its image) is often called path category of LTS: the possible runs of a word <sup>w</sup> <sup>P</sup> <sup>A</sup>˚ in <sup>p</sup>X, , x<sup>0</sup><sup>q</sup> correspond precisely to the functional simulations Jw Ñ pX, , x<sup>0</sup><sup>q</sup> in LTS.

On the abstract level, for a general functor J : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>, we understand the set of morphisms r : Jw <sup>Ñ</sup> X for w <sup>P</sup> <sup>P</sup> and X <sup>P</sup> <sup>M</sup> as the runs of the path w in the model X. We can already make the trivial observation that all morphisms f : X <sup>Ñ</sup> Y in <sup>M</sup> preserve runs: given a run r : Jw <sup>Ñ</sup> X of some path w <sup>P</sup> <sup>P</sup> in X, there is a run f ¨ r : Jw <sup>Ñ</sup> Y of w in Y .

The converse does not hold for a general f : X <sup>Ñ</sup> Y in <sup>M</sup>: given a run of w in Y , there is not necessarily a run of w in X. If f reflects runs, it is called open:

**Definition 2.3.** For a functor J : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>, a morphism f : X <sup>Ñ</sup> Y in <sup>M</sup> is called open if f satisfies the following lifting property for all <sup>e</sup>: <sup>v</sup> <sup>Ñ</sup> <sup>w</sup> in <sup>P</sup>:

$$\begin{array}{ccccc} & Jv \ \neg r \rightarrow X & & &\\ for \ all & \begin{array}{c} & \downarrow \\ & \downarrow \\ & \downarrow \\ Jw & \xrightarrow{s} \rightarrow Y \end{array} & \begin{array}{c} & \begin{array}{c} & Jv \ \neg r \rightarrow X \\ \downarrow \\ & \downarrow \\ Jw & \xrightarrow{s} \end{array} & \begin{array}{c} & \downarrow \\ & \downarrow \\ Jw & \xrightarrow{s} \end{array} & \begin{array}{c} \\ \downarrow \\ Jw & \xrightarrow{s} \end{array} \end{array} \end{array}$$

That is, for all commutative squares (s ¨ Je " f ¨ r), there is d: Jw <sup>Ñ</sup> X in <sup>M</sup> that makes both triangles on the right commute (f ¨ d " s and d ¨ Je " r).

By construction, we can only make statements about states that are reachable via some run. Thus, one often restricts M beforehand to contain only models in which all states are reachable from the initial state.

For LTSs in which all states are reachable from the initial state, open maps are related to strong bisimulations [20]: open maps are precisely functions whose graph relation tpx, fxq | x <sup>P</sup> X<sup>u</sup> is a strong bisimulation. Reformulated in the context of allegories [10], open maps are precisely the maps in the allegory of relations that are strong bisimulations. It is then natural to recover bisimulations as tabulations of open maps, that is:

**Definition 2.4.** For a functor J : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>, we say that two models X and Y are J-bisimilar, if there exist another model Z and two J-open maps f : Z <sup>Ñ</sup> X and g : Z <sup>Ñ</sup> Y , that is, if there is a span of J-open maps between them.

Of course, J-bisimilarity is a reflexive (identities are open maps) and symmetric (by permuting f and g in the definition) relation on models, but it is not transitive in general. It is when the category M has pullbacks [16].

Given a functor <sup>J</sup> : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>, there are more classical ways of defining bisimilarities given in [16]. The first one is *(strong) path bisimulations*, which are relations on runs (similar to history-preserving bisimulations) satisfying the usual bisimilarity conditions. The second one is by using a modal logic similar to the Hennessy-Milner theorem. In the case of LTSs with strong bisimilarity, all those notions describe the same notion of bisimilarity, but that is not true for general <sup>J</sup> : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>: it can only be proved that <sup>J</sup>-bisimilarity implies the existence of a (strong) path bisimulation, which itself implies that the two models satisfy the same formulas of the modal logic. In [6], some mild sufficient conditions in terms of trees (i.e., colimits of paths in M) are given for those three notions to coincide. In particular, all the examples of bisimilarities covered by open maps cited earlier satisfy these conditions.

We use coalgebra for uniform statements about state-based systems of different branching type (including non-deterministic and probabilistic branching):

**Definition 2.5.** *For an object* <sup>1</sup> *of a category* <sup>C</sup> *and an endofunctor* <sup>F</sup> : <sup>C</sup> <sup>Ñ</sup> <sup>C</sup>*, <sup>a</sup>* pointed coalgebra *is a pair of morphisms of* <sup>C</sup> *of the form* <sup>1</sup> <sup>i</sup> ÝÝÑ <sup>X</sup> <sup>ξ</sup> ÝÝÑ FX.

For example, LTSs can be modeled as pointed coalgebras with C " Set, 1 any singleton, and F " PpA ˆ q, where P is the power set functor. The usual notion of morphisms of coalgebras can be spelt out as follows:

**Definition 2.6.** *A* (proper) homomorphism *of pointed coalgebras from* <sup>p</sup>X, ξ, i<sup>q</sup> *to* <sup>p</sup>Y, ζ, j<sup>q</sup> *is a morphism* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *of* <sup>C</sup> *such that the diagram on the right commutes.*

1 X FX Y FY i j œ ξ f œ F f ζ

Pointed coalgebras and proper homomorphisms always form a category, but in the case of LTSs as described above, this category is not equivalent to the category LTS. Indeed, proper homomorphisms are not just morphisms that preserve transitions, but similarly to open maps, they also reflect them. In [25], the authors proved that for a large class of endofunctors, whose coalgebras basically are non-deterministic, proper homomorphisms precisely correspond to J-open maps for a certain functor J. To model morphisms that are only required to preserve transitions, homomorphisms have to be made lax as follows (see [25]):

**Definition 2.7.** *Assume a relation* Ď *on every Hom-set* <sup>C</sup>pX, FY <sup>q</sup>*. A* lax homomorphism *of pointed coalgebras from* <sup>p</sup>X, ξ, i<sup>q</sup> *to* <sup>p</sup>Y, ζ, j<sup>q</sup> *is a morphism* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *of* <sup>C</sup> *such that the diagram on the right laxly commutes, that is,* <sup>f</sup> ¨<sup>i</sup> " <sup>j</sup> *and* F f ¨ <sup>ξ</sup> <sup>Ď</sup> <sup>ζ</sup> ¨ <sup>f</sup> *in* <sup>C</sup>pX, F Y <sup>q</sup>*.*

In the case of the functor PpA ˆ q, we can consider the pointwise inclusion on every Hom-set SetpX,PpA ˆ Y qq. With this, pointed coalgebras and lax homomorphisms form a category which is isomorphic to the category LTS. However, it is not true in general that they form a category, as a compatibility of Ď with the composition is needed as follows:

**Definition 2.8.** *<sup>A</sup>* partial order on F *is a collection of partial orders* <sup>Ď</sup>*, one for each Hom-set of the form* <sup>C</sup>pX, FY <sup>q</sup> *such that*

$$\forall X \xrightleftharpoons FY, \ X' \xrightarrow{g} X, \ Y \xrightarrow{h} Y'; \quad f\_1 \sqsubseteq f\_2 \quad \Rightarrow \quad Fh \cdot f\_1 \cdot g \equiv Fh \cdot f\_2 \cdot g.$$

*This is equivalent to the requirement that the Hom-functor* <sup>C</sup><sup>p</sup> , F <sup>q</sup> *factors through partially ordered sets:* <sup>C</sup><sup>p</sup> , F <sup>q</sup>: <sup>C</sup>op <sup>ˆ</sup> <sup>C</sup> <sup>Ñ</sup> Pos.

*Remark 2.9.* The present definition subsumes the definition of order on a Setfunctor established by Hughes and Jacobs [11, Def 2.1] (details in the appendix).

**Lemma 2.10 [25].** *When* <sup>Ď</sup> *is a partial order on* F*, pointed coalgebras and lax homomorphisms form a category, which we denote by* LCoalgp1, Fq*.*

Much as with open maps, many flavors of bisimilarity can be recovered using spans of proper homomorphisms:

**Definition 2.11.** *We say that two pointed coalgebras are coalgebraically bisimilar if there is a span of proper homomorphisms between them.*

There are many ways of defining bisimilarities in coalgebra (see [13] for an overview), but they coincide for the purpose of the present paper.

# **3 Weighted Bisimilarity and Open Maps**

In this section, we describe known attempts to model weighted systems, and particularly probabilistic ones, using open maps. They all work with some variations of the (discrete) distribution functor on Set. We will denote this functor, which maps a set X to the set

$$\mathcal{D}X = \left\{ f \colon X \to \left[0, 1\right] \mid f^{-1}\left(\left(0, 1\right]\right) \text{ is finite and } \sum\_{x \in X} f(x) = 1 \right\},$$

by D and the variation where the condition " 1 is replaced by ď 1 by Dď<sup>1</sup> (i.e. <sup>D</sup>ď<sup>1</sup>X :" <sup>D</sup>pX ` <sup>1</sup>q). We will prove that, even though Larsen-Skou bisimulations for reactive systems can be modeled with open maps, that is impossible for bisimulations for generative systems.

### **3.1 Larsen-Skou Bisimilarity Using Open Maps**

In [3], Cheng et al. describe an open map situation for Probabilistic Transition Systems (PTSs), which corresponds to coalgebras for the functor <sup>p</sup>D<sup>p</sup> q ` <sup>1</sup>q<sup>A</sup>. In this setting, they consider Partial PTSs (PPTS) which are coalgebras for <sup>p</sup>D<sup>ε</sup> <sup>ď</sup><sup>1</sup><sup>p</sup> q `1q<sup>A</sup> where the sub-probability distributions can have values in hyperreals, allowing infinitesimals ε. The category of PTSs embeds in that of PPTSs, and the path category is the full subcategory of PPTSs consisting of finite linear systems whose probabilities of transitions are infinitesimals. It is then proved that J-bisimilarity, restricted to PTSs, for this path category corresponds to Larsen-Skou's probabilistic bisimilarity [18].

This open map situation has been reformulated in [7] in terms of coreflections: the obvious functor from PPTSs to TSs is a coreflection whose left-adjoint maps a LTS T to the PPTS whose underlying LTS is T and where all transitions have infinitesimal probabilities. In general, given a coreflection F : C Ñ D with leftadjoint G and a path category J on D, one automatically has the path category G ˝ J on C, and this construction preserves good properties of J. In particular, one has that two systems A and B are pG ˝ Jq-bisimilar if and only if F A and F B are J-bisimilar. Cheng et al.'s path category is obtained in this manner with the coreflection above and the standard path category on LTSs. In particular, it means that two PPTSs are bisimilar if and only if their underlying TSs are strongly bisimilar.

### **3.2 Impossibility Result for Generative Systems**

In [5], Desharnais et al. describe several bisimilarities for generative probabilistic systems, that is, coalgebras for the functor Dď<sup>1</sup>pA ˆ q, in a coalgebraic way. They pointed out that their efforts to model those bisimilarities using open maps failed [5, p. 188]. In the following, we see that it is in fact not possible. We will show that for generative probabilistic systems modeled by the category M :" LCoalgp1, Dď<sup>1</sup>pA ˆ qq, there is no open map characterization of the coalgebraic bisimilarity. Actually, the argument here is valid for many other types of weights and is not limited to reals.

Here, for two functions f,g : X Ñ Dď<sup>1</sup>pY q, f Ď g means that for all x P X, for all y P Y , fpxqpyq ď gpxqpyq, where ď is the usual ordering on r0, 1s.

In this situation:

**Theorem 3.1.** For <sup>M</sup> :" LCoalgp1, <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> qq there is no category <sup>P</sup> and no functor <sup>J</sup> : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup> such that for every <sup>h</sup>: <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> with reachable <sup>X</sup> the following equivalence holds:

h is J-open ðñ h is a proper homomorphism

and there is no P and no functor J such that for every X and Y :

X and Y are J-bisimilar ðñ X and Y are coalgebraically bisimilar.

Proof (Sketch). By contradiction, assume that there is such a J. We prove that there is a proper homomorphism of the form:

$$X = \begin{array}{c} \downarrow \downarrow \underbrace{\mathfrak{l}/n, \stackrel{a}{\longrightarrow} x\_1}\_{l/n, a} \stackrel{x\_1}{\longrightarrow} \end{array} \qquad \xrightarrow{h} \qquad Y = \begin{array}{c} \downarrow \stackrel{1}{\longrightarrow} y \\ \downarrow \stackrel{1}{\longrightarrow} y \end{array}$$

which cannot be <sup>J</sup>-open. Consider first the unique lax homomorphism 0<sup>M</sup> <sup>Ñ</sup> <sup>Y</sup> where 0<sup>M</sup> consists in one state and no transition. This is not a proper homomorphism, so it is not open by assumption. That is there is a square:

$$\begin{array}{ccc} JP & - \to 0 \text{m} \\ \downarrow{}\_{\phi} & & \downarrow\_{Y} \\ \downarrow{}\_{Q} & \to & Y \\ JQ & - \to Y \end{array}$$

with no lifting. It is mechanical to check that JP » <sup>0</sup><sup>M</sup> and JQ has at least one transition from its initial state to another state r w, a <sup>z</sup> with <sup>w</sup> ‰ 0. With n " <sup>2</sup> ¨ <sup>r</sup> <sup>1</sup> w <sup>s</sup>, the proper homomorphism <sup>h</sup> above is not open: there cannot be a morphism from JQ to X because w <sup>ą</sup> <sup>1</sup> n . [\

## **4 Generalized Open Maps**

The main argument of the proof of impossibility is the fact that sometimes, a transition with some probability w in the codomain comes from probabilities <sup>w</sup><sup>1</sup>,...,wn with <sup>ř</sup> i <sup>w</sup><sup>i</sup> " <sup>w</sup> in the domain, which makes a lifting morphism impossible with the current framework of open maps.

In this section, we will extend the open map framework with the main intuition that the lifting morphism *splits* the probability w into smaller parts <sup>w</sup><sup>1</sup>,...,wn. After defining these generalized open maps, we show some basic properties of the bisimilarity generated by them.

## **4.1 Generalized Open Maps Situation**

Here, we describe our extension of the open maps framework. The data is similar: we start with a category of models M, but we need more than just a functor J : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup>. Assume:


The classical open maps situation J : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup> fits in this extension as follows. The category E is given by P with the intention that they model path shapes and their *extensions*. The functor <sup>J</sup><sup>E</sup> is given by <sup>J</sup>. The category <sup>S</sup> is given by the discrete category <sup>|</sup>P|, that is, the category whose objects are those of <sup>P</sup> and whose morphisms are only identities. The functor <sup>J</sup><sup>S</sup> is the only possible one respecting the conditions of the definition above.

In the general context of this extension, the interpretation is a bit different. Now V is meant to be a set of trees labelled by alphabets and weights. <sup>E</sup> still consists in extensions, extending trees into trees with longer branches. S then consists in *merging morphisms*, similar to the description above: for the example of weighted systems, those morphisms are allowed to merge states into one, as long as they sum up the weights of the in-going branches. Generally, those morphisms are allowed to perform some merges that are harmless for bisimilarity. With this data, we can define generalized open maps:

**Definition 4.1.** *A morphism* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *in* <sup>M</sup> *is called* (E,S)-open *if it satisfies the following* lifting property *for all* <sup>e</sup>: <sup>v</sup> <sup>Ñ</sup> <sup>w</sup> *in* <sup>E</sup>*:*

The interpretation starts the same as in usual open maps. Assume that we have a tree y in Y extending the image by f of the tree x in X. If f is open, there should be a tree x<sup>1</sup> extending x and whose image by f is y. However, x<sup>1</sup> may have a different shape than y, since it might be necessary to split transitions. That is what u and s are modeling: w is obtained from u by merging some states.

The connection with the classical open maps can be formulated as follows

**Proposition 4.2.** *Given a functor* <sup>J</sup> : <sup>P</sup> <sup>Ñ</sup> <sup>M</sup> *and a morphism* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *,*

<sup>f</sup> *is* <sup>J</sup>*-open if and only if* <sup>f</sup> *is* <sup>p</sup>P, <sup>|</sup>P|q*-open.*

Again, bisimilarity can be defined as the existence of a span of open maps

**Definition 4.3.** *We say that* <sup>X</sup> *and* <sup>Y</sup> *are* <sup>p</sup>E, <sup>S</sup>q*-bisimilar if there is a span of* <sup>p</sup>E, <sup>S</sup>q*-open maps between them.*

#### **4.2 Basic Properties**

In this section, we will prove general properties of <sup>p</sup>E, <sup>S</sup>q-bisimilarity similar to the classical case. First, we show that if <sup>M</sup> has pullbacks, then <sup>p</sup>E, <sup>S</sup>q-bisimilarity is an equivalence relation. Secondly, we describe two notions of path bisimulations, both implied by <sup>p</sup>E, <sup>S</sup>q-bisimilarity. Finally, we prove that it is enough to check openness on some generators of E.

In order to see when <sup>p</sup>E, <sup>S</sup>q-bisimilarity is an equivalence relation, we need to check symmetry, reflexivity, and transitivity. *Symmetry* always holds because we can always swap the legs of the span. For *reflexivity*, it is enough to prove that identities are open which is valid because S is a category and J<sup>S</sup> is a functor, as shown in the diagram on the right. The proof of *transitivity* relies on composition and pullbacks:

**Lemma 4.4.** <sup>p</sup>E, <sup>S</sup>q*-open maps are closed under composition and pullbacks.*

**Theorem 4.5.** *If* <sup>M</sup> *has pullbacks, then* <sup>p</sup>E, <sup>S</sup>q*-bisimilarity is a transitive relation, and thus is an equivalence relation.*

**Generalized Path Bisimulations.** In the classical open map setup [16], another notion of bisimilarity can be defined by using path extensions directly: so-called strong path and path bisimulations, which can be generalized as follows. Like originally [16], we assume that there is an element 0 <sup>P</sup> <sup>V</sup> , such that <sup>J</sup><sup>0</sup> is an initial object of M (note that 0 is not required to be initial in E or S). The intuition is that the unique morphism !<sup>X</sup> : <sup>J</sup><sup>0</sup> <sup>Ñ</sup> <sup>X</sup> points to the initial state of <sup>X</sup>. For example, <sup>J</sup>0 can be given by <sup>p</sup>1, id1, Kq in a category of pointed coalgebras if 1 is the final object of <sup>C</sup> and if <sup>C</sup>p1, F1<sup>q</sup> has the least element <sup>K</sup>: 1 <sup>Ñ</sup> <sup>F</sup><sup>1</sup> (those conditions hold in the cases of interest).

**Definition 4.6.** *A* path simulation *from* A *to* B *in* M *is a set* R *of spans of the form* <sup>A</sup> <sup>a</sup> ÐÝÝ Jv <sup>b</sup> ÝÝÑ <sup>B</sup> *(for* <sup>v</sup> <sup>P</sup> <sup>V</sup> *) satisfying the following two properties*


*We say that* R *is a* strong path simulation *if it additionally satisfies the following:*

**–** *backward closure: for all spans* <sup>A</sup> <sup>a</sup> ÐÝ Jv <sup>b</sup> ÝÑ <sup>B</sup> *in* <sup>R</sup> *and all* <sup>e</sup>: <sup>w</sup> <sup>Ñ</sup> <sup>v</sup> <sup>P</sup> <sup>E</sup>*, we have that the span* <sup>A</sup> <sup>a</sup>¨JE<sup>e</sup> ÐÝÝÝÝ Jw <sup>b</sup>¨JE<sup>e</sup> ÝÝÝÝÑ <sup>B</sup> *belongs to* <sup>R</sup>*.* Jv Jw Jv <sup>A</sup> <sup>P</sup> R B a JEe JEe b

*We say that* <sup>R</sup> *is a* (strong) path bisimulation *from* <sup>A</sup> *to* <sup>B</sup> *if* <sup>R</sup> *and* <sup>R</sup>: " <sup>t</sup><sup>B</sup> <sup>b</sup> ÐÝÝ Jv <sup>a</sup> ÝÝÑ <sup>A</sup> <sup>|</sup> <sup>A</sup> <sup>a</sup> ÐÝÝ Jv <sup>b</sup> ÝÝÑ <sup>B</sup> <sup>P</sup> <sup>R</sup><sup>u</sup> *are (strong) path simulations.*

Remark that this version of (strong) path bisimulations has the same type as the one by Joyal et al. [16], but satisfies more general conditions. In particular, when S is a discrete category, the formulation above is exactly the one from [16]. Obviously, a strong path bisimulation is a path bisimulation.

The main result of this section is the following.

**Theorem 4.7.** *Assume two models* <sup>A</sup> *and* <sup>B</sup> *in* <sup>M</sup>*. If there is a span* <sup>A</sup> <sup>f</sup> ÐÝÝ <sup>C</sup> <sup>g</sup> ÝÝÑ <sup>B</sup> *where* <sup>g</sup> *is a morphism of* <sup>M</sup> *and* <sup>f</sup> *is an* <sup>p</sup>E, <sup>S</sup>q*-open map, then the following set is a strong path simulation:*

$$R\_{f,g} := \{ A \xleftarrow{a} Jv \xrightarrow{b} B \mid \exists c \colon Jv \to C \text{ with } a = f \cdot c \text{ and } b = g \cdot c \} $$

*Consequently, if* <sup>A</sup> *and* <sup>B</sup> *are* <sup>p</sup>E, <sup>S</sup>q*-bisimilar, then there is strong path bisimulation between them.*

As in the classical case of [16], there is no reason for the converse to be true in general: there might be a strong path bisimulation between two models, but no span of generalized open maps. However, conditions from [6] could be accommodated to describe a general framework in which the converse holds. Since this is not the main focus of this paper, we will not do it here, but will show a particular case in Section 6.

**Generators of the Category of Extensions.** In the first example of open maps for LTSs introduced in Section 2, the path category was described as the poset of words with the prefix order. Consequently, to prove that a functional simulation is J-open, we have to prove the lifting property of Definition 4.1 with respect to all pairs w ď w<sup>1</sup> . However, it is sufficient to check the lifting property for extensions by one letter: w<sup>1</sup> " w.a for some a P A. The general reason is that, as a category, pA˚, ďq is generated by the morphisms w ď w.a, and verifying the lifting property with respect to generators of the category P is enough to obtain J-openness. This can be extended to generalized open maps, with additional care.

**Proposition 4.8.** *Assume a subgraph* <sup>E</sup><sup>1</sup> *of* <sup>E</sup> *that generates* <sup>E</sup>*, that is, every morphism of* <sup>E</sup> *is a finite composition of morphisms of* <sup>E</sup><sup>1</sup> *. Assume additionally, that for every* <sup>e</sup> <sup>P</sup> <sup>E</sup><sup>1</sup> *and* <sup>s</sup> <sup>P</sup> <sup>S</sup> *for which* <sup>J</sup>E<sup>e</sup> ¨JS<sup>s</sup> *is well-defined, there are* <sup>s</sup><sup>1</sup> <sup>P</sup> <sup>S</sup> *and* <sup>e</sup><sup>1</sup> <sup>P</sup> <sup>E</sup><sup>1</sup> *such that* <sup>J</sup>E<sup>e</sup> ¨ <sup>J</sup>S<sup>s</sup> " <sup>J</sup>Ss<sup>1</sup> ¨ <sup>J</sup>Ee<sup>1</sup>

*. In that case, if a morphism of* <sup>M</sup> *satisfies the lifting property of Definition 4.1 for all morphisms in* <sup>E</sup><sup>1</sup> *, then it is* <sup>p</sup>E, <sup>S</sup>q*-open. Also, if a set of spans satisfies the conditions of Definition 4.6, where* <sup>E</sup> *is replaced by* <sup>E</sup><sup>1</sup> *, then it is a (strong) path bisimulation.*

The first condition is satisfied when E is a free category and E<sup>1</sup> is its class of generators. The second condition is satisfied for e.g. <sup>E</sup> " <sup>P</sup> and <sup>S</sup> " <sup>|</sup>P|.

## **5 Open Maps for Weighted Systems**

In this section, we will prove that weighted systems can be captured by this generalized open map theory for a large variety of weights, including those needed to capture probabilistic systems.

#### **5.1 Category of Coalgebras for Weighted Systems**

In this section, we will consider weighted functors as follows.

**Definition 5.1.** *Given a commutative monoid* <sup>p</sup>K, `, eq*, the K-weighted functor* pK, `, eq p q : Set <sup>Ñ</sup> Set *is defined as follows on sets and maps:*

*sets:* <sup>X</sup> Ñ pK, `, e<sup>q</sup> <sup>p</sup>X<sup>q</sup> " - <sup>μ</sup>: <sup>X</sup> <sup>Ñ</sup> <sup>K</sup> <sup>|</sup> <sup>μ</sup>´<sup>1</sup>pKzteuq *is finite*( *maps:* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> Ñ pK, `, e<sup>q</sup> pfq pμq " ` <sup>y</sup> <sup>P</sup> <sup>Y</sup><sup>Ñ</sup> <sup>ÿ</sup>tμpxq | <sup>x</sup> <sup>P</sup> X, fpxq " <sup>y</sup><sup>u</sup> ˘ An element μ of pK, `, eq<sup>p</sup>X<sup>q</sup> is a finite distributions sending each x P X to a weight in K. Whenever a map f : X Ñ Y identifies elements fpx1q " fpx2q " ¨¨¨ , then the functor action turns μ into a distribution on Y by adding up the weights μpx1q ` μpx2q`¨¨¨ as elements of X are sent to the same element in Y . Since μ is finite and K is commutative, this addition is well-defined.

Given a commutative monoid pK, `, eq and an alphabet A, we want to consider weighted systems as coalgebras for the functor pK, `, eq pAˆ q . As described in Section 2, we want to be able to talk about lax homomorphisms, so we need an order on pK, `, eq <sup>p</sup>A<sup>ˆ</sup> <sup>q</sup> as in Definition 2.8. For that, we need to assume an ordered commutative monoid pK, `, e, Ďq, that is, a monoid pK, `, eq with a partial order Ď such that ` is monotone in both its arguments.

**Lemma 5.2.** *Given an ordered commutative monoid* pK, `, e, Ďq*, then for all sets* X *and* Y *, the relation on the hom-set* Set` X,pK, `, eq <sup>p</sup>Aˆ<sup>Y</sup> <sup>q</sup>˘ *defined by*

$$f\_1 \subseteq f\_2 \iff \forall x \in X, \,\forall y \in Y, \,\forall a \in A, \, f\_1(x)(a, y) \subseteq f\_2(x)(a, y)$$

*is an order on* pK, `, eq pAˆ q *.*

So, we have a category LCoalg ´ 1,pK, `, eq pAˆ q ¯ of pointed coalgebras and lax homomorphisms. The goal of this section is to design a generalized open maps situation for which <sup>p</sup>E, <sup>S</sup>q-bisimilarity characterizes coalgebraic bisimilarity and more precisely for which <sup>p</sup>E, <sup>S</sup>q-openness characterizes proper homomorphisms.

In the course of the constructions and proofs, we will need additional assumptions that we list here.

**Definition 5.3.** *We call an ordered commutative monoid* pK, `, e, Ďq *a* rearrangement monoid *if it satisfies the additional requirement that if* n, m ě 1 *and*

$$\sum\_{i=1}^{n} x\_i \equiv \sum\_{j=1}^{m} y\_j,$$

*then there exists a family* pui,j q<sup>1</sup>ďiďn,1ďjď<sup>m</sup> *such that*

$$\text{for all } j, \sum\_{i=1}^{n} u\_{i,j} \equiv y\_j \quad \text{and} \text{ for all } i, \sum\_{j=1}^{m} u\_{i,j} = x\_i.$$

*In addition, we say that a rearrangement monoid is* strict *if the condition above holds also when replacing* Ď *with* "*.*

The intuition is as follows. We have some weights arranged as x1,...,xn. We want to be able to decompose those weights into smaller weights, the ui,j s, and by rearranging those small weights obtaining weights smaller than the y<sup>j</sup> . This condition states that this is possible when there is enough weight in total. The special case of strictness is called the *row-column property* in [17].

**Lemma 5.4.** *For any subgroup* <sup>G</sup> *of the real numbers* <sup>p</sup>R<sup>n</sup>, `, ´, <sup>0</sup><sup>q</sup> *such that for all* x*,* y *in* G pminpx1, y1q,..., minpxn, ynqq P G*, the monoids* pG, `, 0, ďq *and* <sup>p</sup>Gě<sup>0</sup>, `, <sup>0</sup>, ďq*, where* <sup>ď</sup> *is the usual order on* <sup>R</sup><sup>n</sup>*, are strict rearrangement monoids.*

*For any lattice with bottom element* pL, ď, \, [, Kq*,* pL, \, K, ďq *is a rearrangement monoid if and only if* pL, ď, \, [q *is distributive. Furthermore, in that case, it is always strict.*

Another property is a form of positivity: we say that an ordered monoid is *positively ordered* if e is the bottom element for Ď, that is, for all k P K, e Ď k.

*Example 5.5.* The positive real line <sup>p</sup>R`, `, <sup>0</sup>, ďq is a positively ordered strict rearrangement monoid and it is necessary to define probabilistic systems. Another example is the monoid of natural numbers <sup>p</sup>N, `, <sup>0</sup>, ďq, which defines the bag functor. Finally, any distributive lattice with bottom element pL, \, K, ďq, typically powerset lattices <sup>p</sup>PpXq, <sup>Y</sup>, <sup>∅</sup>, Ďq, is too. On the contrary, <sup>p</sup>R, `, <sup>0</sup>, ďq and <sup>p</sup>Z, `, <sup>0</sup>, ďq are strict rearrangement monoids but are not positively ordered. Conversely <sup>p</sup>Ně<sup>1</sup>, <sup>ˆ</sup>, <sup>1</sup>, ďq is positively ordered but not a rearrangement monoid. Indeed, it is impossible to rearrange the inequality 2 ˆ 5 ď 3 ˆ 4.

#### **5.2 Generalized Open Maps Situation for Weighted Systems**

Let pK, `, e, Ďq be a commutative ordered monoid. Elements of V<sup>K</sup> are


The function J<sup>K</sup> maps

**–** a word w " pa1, k1q,...,pan, knq to the system

$$Jw = \boxed{\bullet 0 \xrightarrow{(a\_1, k\_1)} 1 \xrightarrow{(a\_2, k\_2)} \cdots \xrightarrow{(a\_n, k\_n)} n}$$

that is, to the coalgebra Jw: t0,...,nu Ñ KpAˆt0,...,nuq such that if b " a<sup>i</sup>`<sup>1</sup> and j " i ` 1 then Jwpiqpb, jq " k<sup>i</sup>`<sup>1</sup>, else " e.

**–** a triple pw1, b, w2q with w<sup>1</sup> " pa1, k1q,...,pan, knq and w<sup>2</sup> " l1,...,l<sup>m</sup> is mapped to the system

$$J(w\_1, b, w\_2) = \left| \bullet 0 \xrightarrow{\{a\_1, k\_1\}} 1 \xrightarrow{\{a\_2, k\_2\}} \cdots \xrightarrow{\{a\_n, k\_n\}} n \xrightarrow{\{b, l\_1\}} \begin{pmatrix} n+1, 1 \\ \vdots \\ \vdots \\ n+1, m \end{pmatrix} \right|$$

that is, Jpw1, b, w2qpnqpn ` 1, iq"pb, liq.

The category E<sup>K</sup> is defined as follows. For every w1, b, and w2, there is a unique edge e from w<sup>1</sup> to pw1, b, w2q. The functor then maps this edge e to JEe, the obvious injection.

The category S<sup>K</sup> has two types of morphisms:

**Fig. 1.** Example of a lifting of a path extension in <sup>R</sup>`-weighted systems and for a singleton label alphabet |A| " 1, thus omitting action labels.


The functor J<sup>S</sup> then maps s of the second type to the proper homomorphism JSs which maps i to i and pn ` 1, jq to pn ` 1, spjqq.

As a piece of notation, for a morphism x: Jw<sup>1</sup> Ñ X, with w<sup>1</sup> of length n we denote xpnq P X by endpxq. We then say that a state p of X is reachable if there is a morphism of type x: Jw<sup>1</sup> Ñ X with endpxq " p. By extension, we say that X is reachable if all its states are reachable.

#### **5.3 Equivalence between Open Maps and Proper Homomorphisms**

An example of an <sup>p</sup>E, <sup>S</sup>q-open map <sup>h</sup> is provided in Figure 1, together with a path extension that is lifted. Like it is often the case in the non-deterministic systems, the lifting map d is not unique. Hence, only existence (and no uniqueness) is required in the lifting property. Since h is a proper homomorphism, it provides a lifting for all extensions, as we show in general:

**Theorem 5.6.** *Assume a lax homomorphism* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *. If* <sup>f</sup> *is* <sup>p</sup>EK, <sup>S</sup>Kq*-open,* X *is reachable, and* K *is positively ordered, then* f *is a proper homomorphism. Conversely, if* f *is a proper homomorphism and* K *is a rearrangement monoid, then* <sup>f</sup> *is* <sup>p</sup>EK, <sup>S</sup>Kq*-open. In particular, if* <sup>K</sup> *is a positively ordered rearrangement monoid, two weighted systems* <sup>X</sup> *and* <sup>Y</sup> *are* <sup>p</sup>EK, <sup>S</sup>Kq*-bisimilar if and only if they are coalgebraically bisimilar.*

For an endofunctor on Set, to prove that coalgebraic bisimilarity is an equivalence relation it is enough to show that the functor preserves weak-pullbacks. In the case of the weighted functor, this is given by strictness (see also [17]):

**Corollary 5.7.** *If* K *is a positively ordered strict rearrangement monoid, then* <sup>p</sup>EK, <sup>S</sup>Kq*-bisimilarity is an equivalence relation.*

#### **5.4 About Sub-distribution Functor**

Until now, we have not dealt with probabilistic systems, that is, coalgebras for the sub-distribution functor Dď<sup>1</sup>. Those coalgebras are particular cases of coalgebras for the weighted functor <sup>X</sup> Ñ pR`, `q<sup>p</sup>X<sup>q</sup> . We want to show in this section that it is equivalent to consider coalgebras for X Ñ Dď<sup>1</sup>pA ˆ Xq as coalgebras for <sup>X</sup> Ñ pR`, `q<sup>p</sup>AˆX<sup>q</sup> , in the sense that, two coalgebras for the former are bisimilar if and only if they are bisimilar when seen as coalgebras for the latter. The main ingredient is the following remark.

**Lemma 5.8.** *Assume a pointed coalgebra* <sup>1</sup> <sup>i</sup> ÝÝÑ <sup>X</sup> <sup>c</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>pAˆX<sup>q</sup> *and assume given a lax (resp. proper) homomorphism* <sup>f</sup> *from* <sup>1</sup> <sup>j</sup> ÝÝÑ <sup>Y</sup> <sup>d</sup> ÝÝÑ pR`, `q<sup>p</sup>Aˆ<sup>Y</sup> <sup>q</sup> *to* <sup>1</sup> <sup>i</sup> ÝÝÑ <sup>X</sup> <sup>c</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>X</sup>qĎpR`, `q<sup>p</sup>AˆX<sup>q</sup> *. Then* <sup>Y</sup> <sup>d</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>Y</sup> <sup>q</sup> *and* <sup>f</sup> *is a lax (resp. proper) homomorphism from* <sup>1</sup> <sup>j</sup> ÝÝÑ <sup>Y</sup> <sup>d</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>Y</sup> <sup>q</sup> *to* <sup>1</sup> <sup>i</sup> ÝÝÑ <sup>X</sup> <sup>c</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>X</sup>q*.*

Remark that this property is not true for the proper distribution functor D. This suggests that we can define a generalized open maps situation <sup>E</sup>D, <sup>S</sup><sup>D</sup> for coalgebras for the functor <sup>X</sup><sup>Ñ</sup> <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>X</sup><sup>q</sup> by considering <sup>E</sup>pR`,`q, <sup>S</sup>pR`,`q as defined in Section 5.2, and restricting it to those v such that Jv is a coalgebra for X Ñ Dď<sup>1</sup>pA ˆ Xq.

**Corollary 5.9.** *A lax homomorphism from* <sup>1</sup> <sup>j</sup> ÝÝÑ <sup>Y</sup> <sup>d</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>Y</sup> <sup>q</sup> *to* <sup>1</sup> <sup>i</sup> ÝÝÑ <sup>X</sup> <sup>c</sup> ÝÝÑ <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>X</sup><sup>q</sup> *is* <sup>p</sup>ED, <sup>S</sup>Dq*-open if and only if it is* <sup>p</sup>EpR`,`q, <sup>S</sup>pR`,`qq*-open. Furthermore, two* <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>q</sup>*-coalgebras are* <sup>p</sup>ED, <sup>S</sup>Dq*-bisimilar if and only if they are* <sup>p</sup>EpR`,`q, <sup>S</sup>pR`,`qq*-bisimilar.*

Finally, the main result of this section:

**Theorem 5.10.** *Let* f : X Ñ Y *be a lax homomorphism between* Dď<sup>1</sup>pA ˆ q*coalgebras* <sup>p</sup>X, c, i<sup>q</sup> *and* <sup>p</sup>Y, d, jq*. If* <sup>p</sup>X, c, i<sup>q</sup> *is reachable and* <sup>f</sup> *is* <sup>p</sup>ED, <sup>S</sup>Dq*-open, then* f *is a proper homomorphism. Conversely, if* f *is a proper homomorphism, then it is* <sup>p</sup>ED, <sup>S</sup>Dq*-open. Moreover, two* <sup>D</sup>ď<sup>1</sup>p<sup>A</sup> <sup>ˆ</sup> <sup>q</sup>*-coalgebras* <sup>p</sup>X, c, i<sup>q</sup> *and* <sup>p</sup>Y, d, j<sup>q</sup> *are* <sup>p</sup>ED, <sup>S</sup>Dq*-bisimilar if and only if they are coalgebraically bisimilar.*

## **6 Open Maps for Branching Bisimilarity**

In this section, we present a new way of modeling branching and weak bisimulations using our generalized framework of open maps. Using this additional flexibility, we do not need to rely on weak morphisms anymore, but on a slight modification of the morphism described in Definition 2.1. Concretely, we build a generalized open map situation such that stuttering branching bisimulations coincide with strong path bisimulations, and that in this case, they precisely characterize <sup>p</sup>E, <sup>S</sup>q-bisimilarity. In addition, in this framework, path bisimulations precisely correspond to weak bisimulations, witnessing branching bisimilarity as the history-preserving analogue to weak bisimilarity.

## **6.1 LTSs with Internal Moves, Category and Bisimilarities**

**Definition 6.1.** *For a fixed set* A *of labels with a particular element* τ *(called* internal move*), the category* WLTS *contains the same objects as* LTS*, and its morphisms* <sup>f</sup> : <sup>p</sup>X, , x0qÑpY, , y0<sup>q</sup> *are functions* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *such that* <sup>f</sup>px0q " <sup>y</sup><sup>0</sup> *and for all* <sup>x</sup> <sup>a</sup> <sup>x</sup><sup>1</sup> *in* <sup>X</sup>*, we have* <sup>f</sup>px<sup>q</sup> <sup>a</sup> <sup>f</sup>px<sup>1</sup> <sup>q</sup>*, or* <sup>a</sup> " <sup>τ</sup> *and* <sup>f</sup>pxq " <sup>f</sup>px<sup>1</sup> q*.*

LTS is a (non-full) subcategory of WLTS, and in fact the LTS-morphisms will be used later in the paper. For easier distinction, we use the terminology *strong morphisms* for WLTS-morphisms that are also in LTS (alluding to *strong bisimulations* which were the bisimulation notion in LTS). Another notion of morphisms are so-called *weak morphisms* [3]:

**–** if <sup>x</sup> <sup>a</sup> <sup>x</sup><sup>1</sup> in <sup>X</sup>, then <sup>f</sup>px<sup>q</sup> <sup>τ</sup> ‹ <sup>a</sup> <sup>τ</sup> ‹ <sup>f</sup>px<sup>1</sup> <sup>q</sup> in <sup>Y</sup> , **–** if <sup>x</sup> <sup>τ</sup> <sup>x</sup><sup>1</sup> in <sup>X</sup>, then <sup>f</sup>px<sup>q</sup> <sup>τ</sup> ‹ <sup>f</sup>px<sup>1</sup> <sup>q</sup> in <sup>Y</sup> .

Though we do not use weak morphisms in the following development of the paper, it is worth mentioning the WLTS-morphisms form a proper subclass of the weak morphisms.

**Definition 6.2.** *<sup>A</sup>* branching bisimulation *from* <sup>p</sup>X, <sup>X</sup>, iX<sup>q</sup> *to* <sup>p</sup>Y, <sup>Y</sup> , i<sup>Y</sup> <sup>q</sup> *is a relation* <sup>R</sup> <sup>Ď</sup> <sup>X</sup> <sup>ˆ</sup> <sup>Y</sup> *such that* <sup>p</sup>iX, i<sup>Y</sup> q P <sup>R</sup>*, and for* <sup>p</sup>x, yq P <sup>R</sup>*:*

$$\begin{array}{c} \text{ } & \text{ } if \, x \xrightarrow{a} \text{ } & x' \text{ } then\\ \text{ } & \text{ } a = \tau \text{ } and \, (x', y) \in R, \, or \\ \text{ } & y \xrightarrow{\tau} \text{ } & y\_1 \xrightarrow{\tau} \dots \xrightarrow{\tau} \text{ } y\_n \xrightarrow{a} \text{ } & z\_1 \xrightarrow{\tau} \dots \xrightarrow{\tau} \text{ } z\_m \text{ } such \text{ } that \, (x, y\_n),\\ \text{ } & \text{ } & (x', z\_1), \text{ } and \, (x', z\_m) \in R. \\ \text{ } & \text{ } & \text{ } & y'. \end{array}$$

*If furthermore in the second condition* <sup>p</sup>x, yiq, <sup>p</sup>x<sup>1</sup> , ziq P <sup>R</sup> *for all* <sup>i</sup> *(and symmetrically in the third condition), then* R *is said to be* stuttering*.*

It is known from [23] that the largest branching bisimulation is stuttering, so that both notions generate the same bisimilarity. In the following, we will prove that strong path bisimulations are more naturally related to stuttering branching bisimulations thanks to their backward closure.

**Definition 6.3.** *<sup>A</sup>* weak bisimulation *from* <sup>p</sup>X, <sup>X</sup>, iX<sup>q</sup> *to* <sup>p</sup>Y, <sup>Y</sup> , i<sup>Y</sup> <sup>q</sup> *is a relation* <sup>R</sup> <sup>Ď</sup> <sup>X</sup> <sup>ˆ</sup> <sup>Y</sup> *such that* <sup>p</sup>iX, i<sup>Y</sup> q P <sup>R</sup>*, and for* <sup>p</sup>x, yq P <sup>R</sup>*:*


It is clear that a (stuttering) branching bisimulation is a weak bisimulation.

#### **6.2 Generalized Open Maps for Branching Bisimulations**

In this section, we describe the generalized open maps situation that captures branching bisimulation. Like for plain LTSs (Def. 2.2), elements of V will be words on A, representing a finite linear LTS labelled by this word. However, to emphasize the particularity of the internal move τ , we will provide another presentation here.

Here, <sup>V</sup> is the set of sequences of the form: <sup>v</sup> " <sup>n</sup>1, a1, n2,...,nk, ak, n<sup>k</sup>`<sup>1</sup> such that <sup>a</sup><sup>i</sup> <sup>P</sup> <sup>A</sup>zt<sup>τ</sup> <sup>u</sup> and <sup>n</sup><sup>i</sup> <sup>P</sup> <sup>N</sup>, e.g. τ τ aτ bcτ "<sup>p</sup> <sup>2</sup>, a, <sup>1</sup>, b, <sup>0</sup>, c, 1. The natural numbers <sup>n</sup><sup>i</sup> <sup>P</sup> <sup>N</sup> – t<sup>τ</sup> <sup>u</sup> ˚ represent the number of internal moves between two observable moves. Then, J maps this sequence to the usual linear LTS:

Elements of E append at most one observable (i.e. non-τ ) move:


The graph morphism <sup>J</sup><sup>E</sup> : <sup>E</sup> <sup>Ñ</sup> <sup>M</sup> maps those edges to the obvious inclusion, mapping state <sup>p</sup>i, j<sup>q</sup> of Jv to the same state in Jw.

Strictly speaking, E is not a category, but just a graph, because we have <sup>a</sup> <sup>e</sup><sup>b</sup> ÝÑ ab and ab <sup>e</sup><sup>c</sup> ÝÑ abc, but there is no morphism from <sup>a</sup> to abc. To fit in the framework of Section 4, we take the free category FreepE<sup>q</sup> generated by this graph and the unique functor extending the graph homomorphism JE. By Proposition 4.8, it is equivalent to consider FreepE<sup>q</sup> and <sup>E</sup> for openness and path bisimulations, so we will talk of <sup>p</sup>E, <sup>S</sup>q-openness, when we mean <sup>p</sup>FreepEq, <sup>S</sup>qopenness, and all the statements and proofs will be done using E only.

Elements of S are trickier to describe. The intuition is that they are morphisms that merge states. In the context of LTSs with internal moves, merging happens when the source and the target of a τ -transition are mapped to the same state. This is crucial for the open maps we want to describe: to lift one τ -transition, it might be necessary to use several τ -transitions. With this knowledge, elements of S are as follows.

**– Merging internal moves**: morphisms in <sup>S</sup> from <sup>v</sup> " <sup>n</sup>1, a1,...,ak, n<sup>k</sup>`<sup>1</sup> to <sup>w</sup> " <sup>n</sup><sup>1</sup> 1, a1,...,ak, n<sup>1</sup> <sup>k</sup>`<sup>1</sup> with <sup>n</sup><sup>i</sup> <sup>ě</sup> <sup>n</sup><sup>1</sup> <sup>i</sup> are <sup>p</sup><sup>k</sup> ` <sup>1</sup>q-tuples <sup>s</sup> " ps1,...,s<sup>k</sup>`<sup>1</sup><sup>q</sup> of monotone surjective functions <sup>s</sup><sup>i</sup> : <sup>t</sup><sup>0</sup> <sup>ă</sup> <sup>1</sup> <sup>ă</sup> ... <sup>ă</sup> <sup>n</sup>iuÑt<sup>0</sup> <sup>ă</sup> <sup>1</sup> <sup>ă</sup> ... <sup>ă</sup> <sup>n</sup><sup>1</sup> iu. For example, there are two morphisms from aτ τ b "<sup>p</sup> <sup>0</sup>, a, <sup>2</sup>, b, 0 to aτ b "<sup>p</sup> <sup>0</sup>, a, <sup>1</sup>, b, 0,

one for each τ that can be dropped. The functor J<sup>S</sup> then maps s to the morphism from Jv to Jw defined by <sup>J</sup><sup>S</sup>psqpi, jq"ps<sup>j</sup> <sup>p</sup>iq, jq.

#### 324 J. Dubut and T. Wimann

As a piece of notation, for a morphism x: Jpn1, a1,...,ak, n<sup>k</sup>`<sup>1</sup>q Ñ X, we denote xpn<sup>k</sup>`<sup>1</sup>, k ` 1q P X by endpxq.

### **6.3 Equivalence of Bisimilarities**

In this section, we prove that <sup>p</sup>E, <sup>S</sup>q-bisimilarity indeed coincides with branching bisimilarity. To do so, we prove first that for the present instance of E and S (Sec. 6.2), <sup>p</sup>E, <sup>S</sup>q-bisimilarity coincides with strong path bisimilarity. In general, <sup>p</sup>E, <sup>S</sup>q-bisimilarity implies strong path bisimilarity (Theorem 4.7), so it remains to show the converse direction for the present instance. To this end, we start by internalizing strong path bisimulations into objects of LTS/WLTS, in order to relate it them to open maps:

**Definition 6.4.** *For a strong path bisimulation* R *from* X *to* Y *, define the LTS* <sup>R</sup><sup>r</sup> " pR, <sup>R</sup>,p<sup>X</sup> ! ÐÝÝ <sup>J</sup><sup>0</sup> ! ÝÝÑ <sup>Y</sup> qq *to have transitions*

$$(X \xleftarrow{x} Jv \xrightarrow{y} Y) \xrightarrow{a} \_{R} (X \xleftarrow{x'} Jw \xrightarrow{y'} Y)$$


**Lemma 6.5.** *In* WLTS*, we have projection maps* X ÐÝÝ <sup>π</sup><sup>X</sup> <sup>R</sup><sup>r</sup> <sup>π</sup><sup>Y</sup> ÝÝÑ <sup>Y</sup> *given by* <sup>π</sup><sup>X</sup> : <sup>p</sup><sup>X</sup> <sup>x</sup> ÐÝ Jv <sup>y</sup> ÝÑ <sup>Y</sup> q Ñ *end*px<sup>q</sup> *and* <sup>π</sup><sup>Y</sup> : <sup>p</sup><sup>X</sup> <sup>x</sup> ÐÝ Jv <sup>y</sup> ÝÑ Y q Ñ *end*pyq*. For every strong morphism* <sup>r</sup> : Jv <sup>Ñ</sup> <sup>R</sup><sup>r</sup> *(i.e.* <sup>r</sup> <sup>P</sup> LTS*),*

$$\operatorname{lend}(r)\text{ is of the form }(X \xleftarrow{\pi\_X \cdot r} Jv \xrightarrow{\pi\_Y \cdot r} Y).$$

Remark that in this statement, we require r to be strong and not just a morphism of WLTS. With a morphism of WLTS, the statement would become that there is <sup>s</sup>: <sup>v</sup><sup>1</sup> <sup>Ñ</sup> <sup>v</sup> <sup>P</sup> <sup>S</sup> such that <sup>π</sup><sup>X</sup> ¨ <sup>r</sup> " <sup>x</sup> ¨ <sup>J</sup>S<sup>s</sup> instead. For the characterization of open maps in WLTS, it suffices for our needs to restrict to strong morphisms:

**Lemma 6.6.** *For* <sup>f</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> *in* WLTS *to be* <sup>p</sup>E, <sup>S</sup>q*-open, it is sufficient to verify the lifting in Definition 4.1 in the special case of* x *being a strong morphism.*

We use this simplification to prove that the projection maps πX, π<sup>Y</sup> are open:

**Proposition 6.7.** *For a strong path bisimulation* R *from* X *to* Y *, the projections* X ÐÝÝ <sup>π</sup><sup>X</sup> <sup>R</sup><sup>r</sup> <sup>π</sup><sup>Y</sup> ÝÝÑ <sup>Y</sup> *are both* <sup>p</sup>E, <sup>S</sup>q*-open.*

The next step is to prove the equivalence between strong path and stuttering branching bisimulations.


**Table 1.** Equivalences of bisimilarity notions in LTSs with τ -actions X, Y P WLTS

**Theorem 6.8.** *If* R *is a stuttering branching bisimulation from* X *to* Y *, then*

$$\overline{R} = \{ X \xleftarrow{x} Jv \xrightarrow{y} Y \mid v = (n\_1, a\_1, \dots, n\_{k+1}) \land \forall i, j. (x(i, j), y(i, j)) \in R \}$$

*is a strong path bisimulation. Conversely, if* R *is a strong path bisimulation, then*

<sup>R</sup><sup>q</sup> " tp*end*pxq, *end*pyqq | p<sup>X</sup> <sup>x</sup> ÐÝ Jv <sup>y</sup> ÝÑ Y q P Ru

*is a stuttering branching bisimulation.*

The same reasoning can be made for weak and path bisimulations:

**Theorem 6.9.** *If* R *is a weak bisimulation from* X *to* Y *, then*

$$\text{During can be made for weak and path bisim}$$

$$\text{If } R \text{ is a weak bisimulation from } X \text{ to } Y, \text{ then}$$

$$\hat{R} = \{ X \xleftarrow{x} Jv \xrightarrow{y} Y \mid (end(x), end(y)) \in R \}$$

*is a path bisimulation. If* <sup>R</sup> *is a path bisimulation, then* <sup>R</sup><sup>q</sup> *is a weak bisimulation.*

In total, we can describe branching and weak bisimilarity by categorical bisimilarity notions, as summarized in Table 1.

## **7 Conclusions and Future Work**

In this paper, we investigate bisimilarities of weighted and probabilistic systems through the theory of open maps. After showing that the usual theory cannot capture weights, we provide a faithful extension of the theory by the notion of mergings. The new theory has similar properties (equivalence relation, characterization as sets of spans, restriction to generators) as classical open maps but also captures bisimilarity of weighted systems and even branching bisimilarity.

The new instances come at the cost of more parameters to the theory. It remains for future work whether the parameters E, S can be combined in a single path category with two morphism classes and morphism factorizations. It would also be illuminating to know whether this new theory satisfies the axioms of a *class of open maps* from [15], in particular for toposes of coalgebras [14].

For the framework as presented, we would like to formally relate it to coalgebra – as this has been done for non-deterministic systems [19,25]. Furthermore, we would like to investigate how system semantics of true concurrency, such as Higher Dimensional Automata [21] can be integrated. Designing open maps for them turned out to be complicated (see [8]), but a hope would be that the addition of mergings would allow modeling homotopy more naturally.

Finally, it would be interesting to see whether our theory capture quantitative extensions of systems classically modeled by open maps, such as probabilistic and quantum extensions of petri nets and event structures (see [24] for example).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Preservation and Reflection of Bisimilarity via Invertible Steps

Ruben Turkenburg()<sup>1</sup> , Clemens Kupke<sup>2</sup> , Jurriaan Rot<sup>1</sup> , and Ezra Schoen<sup>2</sup>

<sup>1</sup> Institute for Computing and Information Sciences (iCIS), Radboud University, Nijmegen, The Netherlands

#### ruben.turkenburg@ru.nl

<sup>2</sup> Department of Computer and Information Sciences, Strathclyde University, Glasgow, UK

Abstract. In the theory of coalgebras, distributive laws give a general perspective on determinisation and other automata constructions. This perspective has recently been extended to include so-called weak distributive laws, covering several constructions on state-based systems that are not captured by regular distributive laws, such as the construction of a belief-state transformer from a probabilistic automaton, and ultrafilter extensions of Kripke frames.

In this paper we first observe that weak distributive laws give rise to the more general notion of what we call an invertible step: a pair of natural transformations that allows to move coalgebras along an adjunction. Our main result is that part of the construction induced by an invertible step preserves and reflects bisimilarity. This covers results that have previously been shown by hand for the instances of ultrafilter extensions and belief-state transformers.

Keywords: Coalgebra · Bisimulations · Weak distributive laws

## 1 Introduction

Distributive laws between a monad T and a functor B are ubiquitous in the theory of coalgebras. They capture various forms of interaction between algebras and coalgebras, including structural operational semantics [45,33], efficient proof techniques [9] and a general coalgebraic determinisation procedure which applies to a wide range of automata and other state-based systems [43,15,29].

The central idea of this general determinisation procedure is to interpret coalgebras in the Eilenberg-Moore category EM(T), as coalgebras for a lifting of B that arises from the distributive law. Behavioural equivalence in EM(T) then amounts to desired notions of equivalence. For instance: language equivalence of non-deterministic automata; weighted automata [7]; Mealy and Moore machines with side-effects [43]; or various types of trace equivalence of transition systems [8].

An illustrative *non-example* of this general determinisation procedure is in a natural construction of belief-state transformers from probabilistic automata, which feature both non-determinism and probabilities. From a categorical perspective, the problem is related to the classical result that there is no suitable distributive law of the probability distribution monad D over the powerset monad

P [46] (also see [47,34] for other non-existence results of distributive laws). Hence, general determinisation via distributive laws seems not applicable here.

Nevertheless, in [12] a concrete coalgebraic account of the construction of belief-state transformers is given, in terms of a two-stage process:


A key result in op. cit. is that the second stage preserves and reflects behavioural equivalence. This shows that behavioural equivalence of coalgebras in EM(D) coincides with distribution bisimilarity on the belief-state transformer.

In [12,21] it was shown that this construction, in fact, arises from a canonical weak distributive law of D over P [22]. Weak distributive laws correspond to so-called weak liftings [19], and—as shown in [22]—these yield a new generalised determinisation procedure which covers the above example, and precisely instantiates to the two stages above. Further examples are the treatment of alternating automata via weak distributive laws in [23], and weak distributive laws for combining non-determinism with semimodules in [10].

However, the result for probabilistic automata that the second stage above preserves and reflects behavioural equivalence has not yet been accounted for in the abstract theory of determinisation via weak distributive laws.

In this paper we provide such an account, starting from a more general setting than weak distributive laws: what we call invertible steps. These basically replace the Eilenberg-Moore adjunction inherent in the weak liftings approach by a general adjunction. In this context, a step allows one to lift the left adjoint to coalgebras—this is a widely occurring phenomenon, for instance in the semantics of coalgebraic modal logic, testing semantics and trace semantics (see [41] for an overview). The key idea here is to assume a right inverse, allowing the lifting of the right adjoint, such that we generalise the two-stage construction above.

We show that, in this setting of an invertible step, the second stage of the two-stage construction preserves and reflects bisimilarity, under mild conditions. As a consequence, we recover the above-mentioned results on preservation and reflection of behavioural equivalence for probabilistic automata [12] for free from the abstract theory.<sup>3</sup> Another motivating example is that of coalgebras for the Vietoris functor on the category of Stone spaces: we obtain that bisimilarity is preserved and reflected by the forgetful functor, recovering the main result in [5].

In fact, the latter example is related to a coalgebraic presentation [36] of ultrafilter extensions, a standard construction in modal logic [6]. It fits within the general setting of invertible steps, but not directly in weak liftings, as it involves the category of Stone spaces (for the duality with Boolean algebras). However, if we move from Stone spaces to compact Hausdorff spaces, then the relevant weak lifting (or invertible step) arises precisely from the weak distributive law

<sup>3</sup> We focus on bisimilarity, but our setting allows for an easy argument that this coincides with behavioural equivalence in this and many related examples.

constructed by Garner [19]. The weak distributive law in *loc. cit.* thus gives rise to ultrafilter extensions in modal logic.

Finally, we include an example of an invertible step involving Setop instead of an Eilenberg-Moore category. Steps for adjunctions with opposite categories are a standard way of presenting the semantics of coalgebraic modal logic [40,32]. The included example shows the generality of the approach.

*Outline.* Section 2 presents (invertible) steps, the relation to weak liftings and distributive laws, and a range of examples. In Section 3 we recall the standard notion of coalgebraic bisimilarity, defined via relation lifting. Section 4 contains the main results on preservation and reflection of bisimilarity. In Section 5 we discuss applications and instances of these results. We discuss other notions of bisimulation, and future work, in Section 6.

## 2 Forward and Backward Steps

We briefly present the required theory of steps, first termed as such in [41]. This structure occurs already in work on coalgebraic modal logic [35,14,40,32,17,38] where a step gives the one-step semantics of a logic. In existing work, only what we call a *forward* step is considered. Here, we also speak of *backward* steps, being arrows in the opposite direction. In the sequel, such forward and backward steps will usually be each other's (one-sided) inverses, referred to as *invertible steps*.

Next, we recall how such steps give rise to liftings of functors between categories of coalgebras and further, when the adjunction underlying the steps can also be lifted to coalgebras [27]. Finally, we present examples of invertible steps from the literature, which we return to in later sections.

For a functor B : C→C, a *coalgebra* is a pair (X, f) consisting of an object X and an arrow f : X <sup>→</sup> BX. A homomorphism from (X, f) to (Y,g) is an arrow h: X <sup>→</sup> Y such that g ◦ h <sup>=</sup> Bh ◦ f. Coalgebras and homomorphisms between them form a category, denoted by Coalg(B), or CoalgC(B) if we wish to make the underlying category explicit.

The category of sets and functions is denoted by Set. For a monad T, we write EM(T) for the category of Eilenberg-Moore algebras. The powerset monad is denoted by <sup>P</sup> : Set <sup>→</sup> Set, given on objects by <sup>P</sup>(X) = {S <sup>|</sup> S <sup>⊆</sup> X}, and the finitely-supported distribution monad by <sup>D</sup>: Set <sup>→</sup> Set, given by <sup>D</sup>(X) = {ϕ: X <sup>→</sup> [0, 1] <sup>|</sup> - <sup>x</sup>∈<sup>X</sup> <sup>ϕ</sup>(x)=1, supp(ϕ) finite} (see also [12]).

## 2.1 Invertible Steps

The basic setting of interest in this work consists of the following:

Definition 2.1. *Given an adjunction* P Q: D→C *and endofunctors* B : C → <sup>C</sup> *and* L: D→D *as in the diagram*

$$\mathfrak{a}\_B \underbrace{\mathfrak{C} \mathfrak{C} \xleftarrow{P} \mathfrak{D}}\_Q \mathfrak{D}^L,\tag{1}$$

*a* (forward) step *is a natural transformation* δ : BQ → QL*. A* backward step *is simply a natural transformation* ι: QL → BQ *going the other way. If, moreover,* δ ◦ ι = id *then we call* δ *an* invertible step *(with right inverse* ι*). Finally, if* δ *witnesses an isomorphism then we call it an* isomorphic step*.*

Notice the asymmetry in the definition of invertible step: ι is always assumed to be a *right* inverse of δ. These invertible steps are the main focus of this paper. Examples are given below in Section 2.2.

Step-induced liftings There is a bijective correspondence between a step and its *mate* <sup>ˆ</sup><sup>δ</sup> : P B <sup>→</sup> LP given by P B P BQP P QLP LP PBη P δP εLP (see [37,31]). This mate and the backward step allow us to define liftings of P and Q to the categories of coalgebras for B and L.

Definition 2.2. *Given steps* <sup>δ</sup> : BQ <sup>→</sup> QL *and* <sup>ι</sup>: QL <sup>→</sup> BQ*, the* step-induced coalgebra liftings P : Coalg(B) → Coalg(L) *and* Q: Coalg(L) → Coalg(B) *of* P *and* Q *are defined by*

$$f\colon X \to BX \quad \mapsto \quad \hat{\delta}\_X \circ Pf\colon PX \to LPX \tag{2}$$

$$g \colon Y \to LY \quad \mapsto \quad \iota\_Y \circ Qg \colon QY \to BQY \tag{3}$$

*on objects and act as* P *and* Q *on arrows. This is well-defined due to functoriality of* P *and* Q *and naturality of* ˆδ *and* ι*.*

It is shown in [27, Theorem 2.14] that, when δ and ι form an isomorphism, the adjunction P Q lifts to an adjunction P Q between the step-induced liftings. For our purposes it will be useful to split the isomorphism condition into the cases where ι is the left or right inverse of δ.

Lemma 2.3. *If* <sup>δ</sup> ◦ <sup>ι</sup> <sup>=</sup> id*, then the counit* <sup>ε</sup>: P Q <sup>→</sup> Id *of the adjunction* <sup>P</sup> <sup>Q</sup> *lifts to a natural transformation* ε: P Q → Id*. If* ι ◦ δ = id*, then the unit* η : Id → QP *of the adjunction lifts to a natural transformation* η : Id → Q P*.*

The combination of these two liftings gives us the lifting of the adjunction.

Corollary 2.4. *If* <sup>δ</sup> *and* <sup>ι</sup> *form an isomorphism, then* <sup>P</sup> <sup>Q</sup>*.*

In such a situation, Q (being a right adjoint) preserves the final coalgebra for L (the limit of the empty diagram) when this exists. However, there are a number of known examples where the step is not an isomorphism; instead we only have a one-sided inverse. We consider, in particular, these invertible steps, and in the next subsection give a number of examples of this setting.

## 2.2 Steps from weak liftings, and other examples

*Example 2.5.* Our first example arises from the work of Garner, who shows that the Vietoris monad V on the category CHaus of compact Hausdorff spaces arises as a so-called *weak lifting* of the powerset monad [19] (we discuss weak liftings in general after this example). For the definition of the Vietoris monad the reader is referred to [19, Sec. 2.3]. The category CHaus is equivalent to the Eilenberg-Moore category EM(β) of the ultrafilter monad β [39]. The weak lifting provided by Garner consists of natural transformations ι, δ, satisfying δ ◦ ι <sup>=</sup> id:

$$\iota^{\mathcal{P}} \square \mathsf{Set} \xleftarrow{\mathcal{F}} \widehat{\mathcal{E}\mathcal{M}(\beta)} \square \nu \qquad \qquad U\mathcal{V} \xrightarrow{\iota} \mathcal{P}U \xrightarrow{\delta} U\mathcal{V} \tag{4}$$

where F U is the Eilenberg-Moore adjunction of β. Notice that δ is an invertible step, with right inverse <sup>ι</sup>. As shown by Garner, a component <sup>δ</sup><sup>X</sup> : <sup>P</sup>UX <sup>→</sup> <sup>U</sup>VX, sends each subset S ∈ PUX to its topological closure. The components of ι simply include the closed subsets into the powerset.

It turns out that this invertible step gives rise to ultrafilter extensions of Kripke frames. In modal logic, ultrafilter extensions [6,20,4] are a construction taking a Kripke frame (which we can see as a coalgebra for the powerset functor P) with state space <sup>W</sup> and forming a new Kripke frame with states being ultrafilters over W. The central motivation for this is in "bisimilarity-somewhere-else" results: two states are modally equivalent iff they are bisimilar in the ultrafilter extension.

Now, the composition of the step-induced coalgebra liftings F : Coalg(P) → Coalg(V) and U : Coalg(V) <sup>→</sup> Coalg(P), precisely yields the ultrafilter extension of a Kripke frame. The first stage β is the actual extension, which turns the Kripke frame into a <sup>V</sup>-coalgebra. The second stage <sup>U</sup> turns this back into a Kripke frame, i.e., a powerset coalgebra in Set.

In [36], ultrafilter extensions are developed more generally for coalgebras for a functor <sup>B</sup> : Set <sup>→</sup> Set, via the duality between Boolean algebras and Stone spaces. In fact, since both V and the left adjoint F restrict to the category Stone of Stone spaces, the invertible step δ, ι restricts to an invertible step in the restriction of the above adjunction to Stone.

In general, for monads S, T on a category <sup>C</sup>, Garner [19] defines S˜: EM(T) <sup>→</sup> EM(T) to be a weak lifting of S if there are natural transformations

$$U\ddot{S} \stackrel{\iota}{\longrightarrow} SU \stackrel{\delta}{\longrightarrow} U\ddot{S} \tag{5}$$

with δ ◦ ι <sup>=</sup> id and satisfying further axioms, where U denotes the forgetful functor from EM(T) to <sup>C</sup>. They show that there is a bijective correspondence between weak distributive laws of <sup>T</sup> over <sup>S</sup>, and weak liftings of <sup>S</sup> to EM(T), in case idempotents in C split (which holds for Set). Here, we do not assume a monad structure on S (which is why the additional axioms are not relevant). In this case, a weak lifting is precisely an invertible step, where the underlying adjunction is an Eilenberg-Moore adjunction.

*Example 2.6.* In [11,12], a procedure is given for "determinising" probabilistic automata (PAs), which model systems with both non-determinism and probabilities, into belief state transformers. It was shown in [22] that this is an instance

of a more general determinisation procedure induced by a weak lifting, which in turn corresponds to a canonical weak distributive law.

Stated for a general monad T with the usual Eilenberg-Moore adjunction F - U : EM(T) → C, this general determinisation procedure thus starts from an invertible step (weak lifting) δ : BU → UB. This gives rise to a two-step process:

$$\mathsf{Coalg}\_{\mathcal{C}}(BT) \xrightarrow{\overline{\mathcal{F}}} \mathsf{Coalg}\_{\mathcal{E}\mathcal{M}(T)}(\overline{B}) \xrightarrow{\overline{U}} \mathsf{Coalg}\_{\mathcal{C}}(B) \tag{6}$$

where the second functor U is simply the step-induced lifting of U. The first is a variation of a step-induced lifting (notice that it takes BT-coalgebras rather than B-coalgebras as input), mapping a coalgebra f : X → BTX to <sup>F</sup><sup>X</sup> <sup>F</sup>BUF<sup>X</sup> <sup>B</sup>FUF<sup>X</sup> <sup>B</sup>F<sup>X</sup> <sup>F</sup><sup>f</sup> <sup>δ</sup> <sup>ˆ</sup>UF<sup>X</sup> BεF<sup>X</sup> , where <sup>ε</sup> is the counit of the Eilenberg-Moore adjunction. In fact, this can be viewed as a step-induced lifting for BT which arises by composing δ and the counit, see [41].

We instantiate this to the Eilenberg-Moore adjunction of the distribution monad D, where P<sup>c</sup> is the convex powerset monad:

$$\mathcal{P}\underbrace{\mathsf{C}\mathsf{Set}}\_{U}\xleftarrow{\mathcal{F}}\_{\mathcal{LM}(\mathcal{D})}\mathsf{C}\mathcal{M}(\mathcal{D})\xleftarrow{\mathcal{F}}\mathcal{P}\_{c}\tag{7}$$

We take Pc(X) to have as underlying set {S ⊆ X | S convex} following [22]. This matches the usage of Pne + 1 and P<sup>c</sup> + 1 in [12], where Pne and P<sup>c</sup> are defined to exclude the empty set. A subset is convex if it is closed under convex combinations (see [12] for details). Further, the category EM(D) is equivalent to the category of convex algebras and convex maps.

It is explained in [22, Sec. 5] that we have an invertible step in the setting of Eq. (7), which sends a subset X to its convex hull (the smallest convex set containing X) and that the lifting F of (6) then gives the transformation of a probabilistic automaton into a belief state transformer in the category EM(D). The second step is then to transfer the obtained belief state transformer back to Set with the step-induced lifting of U. As shown in [12] and later recovered from our abstract theory (Section 5), this yields a system with the same behaviour. In fact, this is done for automata with labels, i.e., for the functors <sup>P</sup><sup>L</sup> and <sup>P</sup><sup>L</sup> <sup>c</sup> with L a set of labels. The weak lifting we will require in this context is given in [21].

*Example 2.7.* The following example from automata and languages considers a dual adjunction P - <sup>Q</sup>: <sup>D</sup>op → C. One motivation to discuss this kind of example stems from coalgebraic modal logic where C commonly is some category of 'spaces' and D commonly is a category of 'algebras' [32]. The setup is as follows:

$${}^{B}\widehat{\text{C}}\text{Set}\xleftarrow{\text{2-}}\text{Set}^{\text{2-}}\text{Set}^{\text{op}}\xleftarrow{} {}^{L}\text{}\underline{\qquad}2^{L}\stackrel{\iota}{\longrightarrow}B(2^{-})\xrightarrow{\delta}2^{L}\tag{8}$$

Here, we have BX = 2 <sup>×</sup> (PX)<sup>Σ</sup> and LX =1+ <sup>Σ</sup> <sup>×</sup> <sup>X</sup> for a fixed alphabet <sup>Σ</sup>. The step δ is given by

$$\delta(i,\xi) = \{\mathsf{inl}(\*) \mid i = 1\} \cup \{\mathsf{inr}(a, x) \mid a \in \Sigma, x \in \bigcup \xi(a)\}\tag{9}$$

334 R. Turkenburg et al.

This step δ is invertible, e.g., by ι as in Eq. (10).

$$\iota(u) = (1 \text{ iff } \text{inl}(\*) \in u, a \mapsto \{v \mid \{(a, x) \mid x \in v\} \subseteq u\}) \tag{10}$$

<sup>A</sup> B-coalgebra is a non-deterministic automaton. An L-coalgebra in Setop is an algebra X <sup>←</sup> 1 + Σ <sup>×</sup> X in Set, which can be seen as specifying the initial state and transition structure of a deterministic automaton. From this point of view, the coalgebra lifting Q: Coalg(L) <sup>→</sup> Coalg(B) can be seen as first reversing, and then performing a powerset construction. The specific powerset construction might depend on the chosen right inverse ι, as it is not unique. For ι as in (10), for example, u <sup>a</sup> <sup>→</sup> v in Q(A) if and only if each state in v is reachable from a state in u via an a-transition in the reverse of <sup>A</sup>.

In Section 5 we return to these examples and show how we can apply the techniques from Section 4 to obtain preservation and reflection of bisimilarity.

## 3 Relations, Liftings and Coalgebraic Bisimulations

We recall the standard notion of coalgebraic bisimulation defined via relation lifting, broadly following [30,28]. Note, we will use some terminology from the theory of fibrations to allow us to be more concise and many of the coming results can be generalised to a larger class of fibrations, but knowledge of fibrations is not required as we give a self-contained presentation of the fibration of relations.

We make the following assumptions for the remainder of the paper:

Assumption 3.1. *We assume categories* <sup>C</sup>, <sup>D</sup> *with all finite limits, and factorisation systems* (E<sup>1</sup>,M1)*,* (E<sup>2</sup>,M2) *respectively for which* <sup>M</sup><sup>1</sup> <sup>=</sup> Mono<sup>C</sup>,M<sup>2</sup> <sup>=</sup> Mono<sup>D</sup> *and for any left adjoint functor* <sup>P</sup> : C→D *we have* <sup>P</sup>(E1) ⊆ E<sup>2</sup>*.*

We assume finite limits mainly for binary products and pullbacks to allow the definitions of relations and inverse images. The assumptions that maps in M are mono means that pullbacks of abstract monos and factorisation both yield monos, which represent subobjects. The final condition specifies that left adjoints preserve abstract epis. This is required in Section 4.2 and holds, e.g., when the involved categories possess a (RegEpi, Mono)-factorisation system [16,2], as in all our examples from Sections 2.2 and 5.

For a category C satisfying the above, the category Rel(C) consists of:


$$\bigcup\_{X \times X}^{R} \xrightarrow{\cdots \xrightarrow{\cdots \xrightarrow{\cdots} \xrightarrow{\cdots} S}} \bigcup\_{Y \times Y}^{S} \tag{11}$$

In Set, these are subsets of the binary product of underlying sets as usual, and maps between relations constitute maps between the products sending R to S, i.e., xRy implies u(x) S u(y). Objects of Rel(Stone) are closed relations, as the image of a mono representing a subobject is homeomorphic to its domain, and images of continuous functions are compact and thus closed. In the case of an Eilenberg-Moore category for a monad T, objects of Rel(EM(T)) are congruences, as the map into the product is an algebra morphism.

Remark 3.2. A note on notation: we use for epis and for monos and the subobjects they represent. We use for abstract epis and for abstract monos, i.e., maps in E and M respectively.

Using the factorisation system on D, we lift a functor F : C→D to a functor Rel(F): Rel(C) → Rel(D). The action on objects is given by the factorisation

$$\begin{array}{c} FR \xrightarrow{Fr} F(X \times X) \xrightarrow{\langle F\pi\_1, F\pi\_2\rangle} FX \times FX\\ \underbrace{\begin{array}{c} e \end{array} \text{ } \mathsf{Rel}(F)(R) \xleftarrow{m} \end{array} \tag{12}$$

The action on arrows is defined by orthogonality. The resulting functor Rel(F) is a lifting in the sense that the following diagram commutes

$$\begin{array}{c} \mathsf{Rel}(\mathcal{C}) \xrightarrow{\mathsf{Rel}(F)} \mathsf{Rel}(\mathcal{D}) \\ \Downarrow \\ \mathcal{C} \xrightarrow{\mathcal{P}} \xrightarrow{F} \mathcal{D} \end{array} \tag{13}$$

where p: Rel(C) → C sends a relation R - X × X to the object X, and similarly for q. We say (following the terminology of fibrations) that the relation R is above the object X and a map between relations is above the map u from Eq. (11). Note that commutativity of diagram (13) expresses that Rel(F), applied to a relation R -X × X on X, yields a relation on F X.

Given a category of relations Rel(C), called the total category, the subcategory (also called a fibre) RelX consists of objects <sup>R</sup> - X × X and maps above the identity on X. For relations in Set, such maps are inclusions of relations. In general, these maps are unique, and writing R ≤ S iff there is an arrow from R to S turns the fibre into a poset. A relation lifting Rel(F) can be restricted to the fibres to give a functor Rel(F)X : RelX <sup>→</sup> RelFX. Since RelX and RelFX are posetal categories, Rel(F)X can be viewed as a monotone map.

For a map f : X → Y in C, we have the direct image and inverse image functors - f : Rel<sup>X</sup> <sup>→</sup> Rel<sup>Y</sup> and <sup>f</sup> <sup>∗</sup> - : Rel<sup>Y</sup> <sup>→</sup> RelX. For relations on sets, we have f (<sup>R</sup> <sup>⊆</sup> <sup>X</sup> <sup>×</sup> <sup>X</sup>) = {(f(x), f(y)) <sup>|</sup> (x, y) <sup>∈</sup> <sup>R</sup>} and <sup>f</sup> <sup>∗</sup>(<sup>S</sup> <sup>⊆</sup> <sup>Y</sup> <sup>×</sup> <sup>Y</sup> ) = {(x, y) <sup>∈</sup> X × X | (f(x), f(y)) ∈ S}. More generally, they are obtained as the factorisation and pullback in the left and right diagram below respectively

$$\begin{array}{ccc} R \xrightarrow{R} \begin{array}{c} \coprod\_{f}(R) \\ \coprod\_{f}(R) \\ \coprod\_{f\times f}(r) \end{array} & \begin{array}{c} f^\*(S) \\ \xrightarrow{f^\*(s)} \end{array} & \begin{array}{c} \xrightarrow{S} \\ \xrightarrow{f^\*(s)} \end{array} \\\ X\times X & \xrightarrow{f\times f} X\times Y \end{array} \tag{14}$$

It can further be shown that - <sup>f</sup> <sup>f</sup> <sup>∗</sup>. We say that Rel(F): Rel(C) <sup>→</sup> Rel(D) preserves inverse images if Rel(F)<sup>X</sup> ◦ <sup>f</sup> <sup>∗</sup> = (F f)<sup>∗</sup> ◦ Rel(F)<sup>Y</sup> .

In this context, *a bisimulation for a* <sup>B</sup>*-coalgebra* <sup>f</sup> : <sup>X</sup> <sup>→</sup> BX is a post-fixed point of the endofunctor <sup>f</sup> <sup>∗</sup> ◦Rel(B)<sup>X</sup> : Rel<sup>X</sup> <sup>→</sup> RelX, i.e., a relation <sup>R</sup> - <sup>X</sup> <sup>×</sup><sup>X</sup> such that <sup>R</sup> <sup>≤</sup> <sup>f</sup> <sup>∗</sup> ◦ Rel(B)X(R). Bisimilarity is then obtained as the greatest fixed point <sup>ν</sup>(<sup>f</sup> <sup>∗</sup> ◦ Rel(B)X), if it exists. In Set a bisimulation is a relation <sup>R</sup> such that <sup>R</sup> <sup>⊆</sup> (<sup>f</sup> <sup>×</sup> <sup>f</sup>)−<sup>1</sup>(Rel(B)(R)), i.e., if xRy then <sup>f</sup>(x) Rel(B)(R) <sup>f</sup>(y).

# 4 Preserving and Reflecting Bisimilarity

In this section we show that, in the presence of an invertible step, bisimilarity is preserved and reflected by the step-induced lifting of the right adjoint, given some further mild conditions. This allows us to recover a number of existing results for concrete instances (Section 5).

Our approach is as follows:


Throughout this section we assume categories C and D as in Assumption 3.1, and an invertible step <sup>δ</sup> : BQ <sup>→</sup> QL with right inverse <sup>ι</sup>: QL <sup>→</sup> BQ (and P, Q, B, L as in Definition 2.1).

## 4.1 Preservation and reflection

We now make precise what it means for a monotone map h to preserve and reflect bisimulations. This will be instantiated to bisimulations, captured abstractly as post-fixed points of a monotone map <sup>f</sup> : <sup>Γ</sup> <sup>→</sup> <sup>Γ</sup> on a poset <sup>Γ</sup>, which typically consists of relations (Section 3). These are compared against a second type of bisimulations, modelled as post-fixed points of another monotone map <sup>g</sup> : <sup>Δ</sup> <sup>→</sup> <sup>Δ</sup>. This motivates the following definition.

Definition 4.1. *Let* <sup>Γ</sup> *and* <sup>Δ</sup> *be posets, and* <sup>f</sup> : <sup>Γ</sup> <sup>→</sup> <sup>Γ</sup>*,* <sup>g</sup> : <sup>Δ</sup> <sup>→</sup> <sup>Δ</sup> *monotone maps. A monotone map* <sup>h</sup>: <sup>Γ</sup> <sup>→</sup> <sup>Δ</sup> preserves post-fixed points *if* <sup>x</sup> <sup>≤</sup> <sup>f</sup>(x) *implies* <sup>h</sup>(x) <sup>≤</sup> <sup>g</sup>(h(x))*. It* reflects post-fixed points *if the converse implication holds.*

In the step setting of Eq. (1), bisimulations for B- and L-coalgebras can be represented as post-fixed points of monotone maps on posets of relations as in Section 3. More concretely:


The two can be compared via the restriction Rel(Q)<sup>X</sup> : Rel<sup>X</sup> → RelQX of the functor Rel(Q). Indeed, our main objective is to show that in the presence of an invertible step, Rel(Q)<sup>X</sup> preserves and reflects post-fixed points representing bisimulations, and that it maps the greatest fixed point in Rel<sup>X</sup> (bisimilarity on f) to the greatest fixed point in RelQX (bisimilarity on ι<sup>X</sup> ◦ Qf). In this context we speak about *preservation and reflection of bisimulations/bisimilarity*.

#### 4.2 Proof of preservation and reflection

We are now ready to prove preservation and reflection of bisimilarity, in the sense described in the previous subsection. First, the following basic lemma provides a method of showing preservation and reflection of post-fixed points, which will be useful for our purposes.

Lemma 4.2. *Let* Γ *and* Δ *be posets, and* f : Γ → Γ*,* g : Δ → Δ *and* h: Γ → Δ *monotone maps. Suppose that* h *has a left (lower) adjoint* k : Δ → Γ*, and the equality* gh = hf *holds. Then* h *maps the greatest fixed point of* f *to the greatest fixed point of* g*, when these exist;* h *preserves post-fixed points; and if* h *is order-reflecting, then* h *reflects post-fixed points.*

Categorically speaking, the equality gh = hf is an isomorphic step. Instantiated to our setting of interest, Lemma 4.2 gives us a method for proving preservation and reflection of bisimilarity: it suffices to show each of the following.


To obtain the required adjunction between the fibres Rel<sup>X</sup> and RelQX, we first establish the adjunction Rel(P) Rel(Q) between the total relation categories. Given Theorem 3.1, we can lift the unit and counit of the adjunction P Q, using the transformations constructed in the following lemma.

Lemma 4.3. *Let* F : C→D *and* G: D→E *be functors, with* Rel(F): Rel(C) → Rel(D) *and* Rel(G): Rel(D) → Rel(E) *the corresponding relation liftings. Then we have a natural transformation* Rel(GF) → Rel(G) Rel(F)*. Further, if* G *preserves abstract epis, then there is also a natural transformation* Rel(G) Rel(F) → Rel(GF)*. Also, the constructed transformations are above the identity.*

We note that the first part is in [28, Exercise 4.4.6] and the result is proved for Set endofunctors in [9, Lemma 14.1]. This allows the lifting of the adjunction, which we note may also be obtainable from results on fibred adjunctions in [30,26], but a direct proof is quite straightforward; the main idea is to use Lemma 4.3 together with preservation of abstract epis by P.

Lemma 4.4. *The adjunction* <sup>P</sup> - <sup>Q</sup>: D→C *lifts to relations, i.e., the following diagram is commutative, and the unit and counit of the upper adjunction are above the unit and counit of* <sup>P</sup> -Q*.*

$$\mathop{\mathrm{Rel}}\limits\_{p} \underbrace{\mathop{\mathrm{Rel}}\limits\_{p} \left(\mathcal{D}\right)}\_{\mathcal{P}} \underbrace{\mathop{\mathrm{Rel}}\limits\_{p} \mathrm{Rel}(\mathcal{D})}\_{\mathcal{P}} $$

The relation lifting defined in Section 3 allows us to define endofunctors Rel(B), Rel(L) in the context of the above adjunction:

$$\operatorname{Rel}(B)\coprod\operatorname{Rel}(\mathcal{C})\xleftarrow{\operatorname{Rel}(P)}\operatorname{Rel}(\mathcal{D})\xrightarrow{\operatorname{Rel}(P)}\operatorname{Rel}(L)\tag{16}$$

In this setting, we may try to lift the step δ or its converse ι to this adjunction. It turns out that δ always lifts. For ι, there is a sufficient condition which is independent of ι itself: that Q preserves abstract epis. In both cases, this result follows essentially from Lemma 4.3.

Proposition 4.5. *For a forward step* δ *and backward step* ι*, we have:*


The condition that Q preserves abstract epis holds, e.g., in case it is the forgetful functor in an adjunction monadic over Set. This is because Eilenberg-Moore categories of monads on Set have (RegEpi, Mono)-factorisation systems, and the forgetful functor sends regular epis to epis in Set as discussed in [13, Example 2.3]. It also holds in the Stone-Set case, as Stone is a reflective subcategory of CHaus (which is equivalent to the category of algebras for the ultrafilter monad).

The lifted steps δ and ι give step-induced liftings of Rel(P) and Rel(Q) between Coalg(Rel(B)) and Coalg(Rel(L)). Since bisimulations can be equivalently presented as coalgebras for Rel(B) and Rel(L), these liftings can be used to capture preservation of bisimulations. But it is less obvious what reflection means in this context and how to prove it. For reflection of bisimulations by Rel(Q), we turn our attention to the fibres, as described in the beginning of this section.

As a consequence of Proposition 4.5 and of δ ◦ ι = id, we obtain the following result, which will later be used in the construction of a step on the fibres.

Lemma 4.6. *Let* δ *be an invertible step with right inverse* ι*, and suppose* Q *preserves abstract epis. Then* Rel(Q)LX ◦ Rel(L)<sup>X</sup> <sup>=</sup> <sup>ι</sup> ∗ <sup>X</sup> ◦ Rel(B)QX ◦ Rel(Q)X*.*

Adjoining the fibres Next, we construct an adjunction between the fibres Rel<sup>X</sup> and RelQX. The usual restriction Rel(Q)<sup>X</sup> of Rel(Q) to the fibre Rel<sup>X</sup> will be the right adjoint, similarly to the adjunction obtained earlier. To map back into the fibre RelX, we post-compose Rel(P)QX with - <sup>ε</sup>, the direct image functor obtained from the counit of the adjunction Rel(P) - Rel(Q). We note the similarity with results on fibred adjunctions in [30], where only adjunctions over a single base category are considered.

#### Lemma 4.7. We have an adjunction - <sup>ε</sup> ◦ Rel(P)QX -Rel(Q)<sup>X</sup> : Rel<sup>X</sup> <sup>→</sup> RelQX.

The above lemma fulfils the first proof obligation stated in the beginning of Section 4.2. It now remains to show the second proof obligation, i.e., that we have an isomorphic step in the following setting: -

$$(\iota\_X \circ Qf)^\* \circ \operatorname{Rel}(B)\_{QX} \subsetneq \bigoplus\_{\mathbf{Rel}\_{QX}} \underbrace{\operatorname{Rel}\_{QX} \xleftarrow{\bullet \operatorname{Rel}(P)\_{QX}} \operatorname{Rel}\_X}\_{\operatorname{Rel}(Q)\_X} \uplus f^\* \circ \operatorname{Rel}(L)\_X} \times \tag{17}$$

To this end, we first show that Rel(Q) preserves inverse images, using the fact that we can obtain inverse images as pullbacks inside the category of relations. Since Rel(Q) is a right adjoint, it preserves these pullbacks.

Lemma 4.8. Rel(Q) preserves inverse images.

We are now ready to show the existence of the required isomorphic step.

Theorem 4.9. If Q preserves abstract epis, then for any L-coalgebra (X, f):

$$(\iota\_X \circ Qf)^\* \circ \operatorname{Rel}(B)\_{QX} \circ \operatorname{Rel}(Q)\_X = \operatorname{Rel}(Q)\_X \circ f^\* \circ \operatorname{Rel}(L)\_X \tag{18}$$

Proof. We have

$$(\iota\_X \circ Qf)^\* \circ \text{Rel}(B)\_{QX} \circ \text{Rel}(Q)\_X = (Qf)^\* \circ \iota\_X^\* \circ \text{Rel}(B)\_{QX} \circ \text{Rel}(Q)\_X \tag{19}$$

$$=(Qf)^{\*} \circ \operatorname{Rel}(Q)\_{LX} \circ \operatorname{Rel}(L)\_{X} \tag{20}$$

$$=\operatorname{Rel}(Q)\_X \circ f^\* \circ \operatorname{Rel}(L)\_X \tag{21}$$

where Eq. (19) is an application of a basic fact on inverse images (technically, that the poset fibration of relations is split), Eq. (20) holds by Lemma 4.6, and Eq. (21) holds by Lemma 4.8.

We now reach our main result on preservation and reflection of bisimulations and bisimilarity by Rel(Q)X.

Theorem 4.10. Let (X, f) be an L-coalgebra. Suppose that Q preserves abstract epis. Then Rel(Q)<sup>X</sup> maps bisimilarity on (X, f) (when it exists) to bisimilarity on Q(X, f). Further, Rel(Q)<sup>X</sup> preserves bisimulations and, if it is order-reflecting, also reflects bisimulations.

Proof. We have seen in Lemma 4.7, that Rel(Q)<sup>X</sup> has a left adjoint, and in Theorem 4.9, that in this setting we have an isomorphic step. The result now follows from Lemma 4.2.

While this result is formulated in terms of Rel(Q)X, we will also speak of simply Q preserving and reflecting both bisimulations and bisimilarity.

As a special case of Theorem 4.10, we recover (a version of) the following existing result found in [42,3,11,12].

Lemma 4.11. Assume functors B,L: C→C, and a natural transformation ι: L → B. Then the functor Id: Coalg(L) → Coalg(B) defined by (X, f) → (X, σ<sup>X</sup> ◦ f) on objects and identity on morphisms, preserves bisimulations. If additionally ι has a left inverse, Id reflects bisimulations.

We briefly turn to the condition of order-reflectingness. As we are often interested in cases where the right adjoint is a forgetful functor in the context of an Eilenberg-Moore adjunction, it is useful to state the following.

Lemma 4.12. For a monad T with forgetful functor U : EM(T ) → C, the (restricted) lifting Rel(U)<sup>X</sup> is an order-reflecting map.

If C = Set in the above lemma, then Rel(U)<sup>X</sup> is just the inclusion of the poset of congruences Rel<sup>X</sup> on an algebra X into the poset of all relations on its carrier.

In that case, we can also use the above to show preservation and reflection of behavioural equivalence. Two states of a coalgebra (in Set) are behaviourally equivalent if they can be identified by some coalgebra homomorphism. This can be captured more abstractly using kernel bisimulations (see, e.g., [44]). Since U is assumed to be a forgetful functor to Set, we simply define preservation and reflection of behavioural equivalence by U to mean that for any two states x, y of an L-coalgebra (X, f), x and y are behaviourally equivalent (for (X, f)) if and only if they are behaviourally equivalent for U(X, f).

It turns out that, in our setting, coincidence of bisimilarity and behavioural equivalence for L-coalgebras reduces to coincidence for B-coalgebras. This is stated in the following lemma; the essence is that U is easily shown to preserve behavioural equivalence.

Lemma 4.13. For a monad T, consider the Eilenberg-Moore adjunction F U : EM(T) → Set with functors L: EM(T) → EM(T) and B : Set → Set, and an invertible step δ : BU → UL. Further suppose that U preserves and reflects bisimilarity, and that B preserves weak pullbacks. Then bisimilarity and behavioural equivalence for L-coalgebras coincide (and hence, U preserves and reflects behavioural equivalence).

Remark 4.14. We conclude with a brief exploration of preservation and reflection by the restriction of the left adjoint Rel(P)X, in the setting of

$$f^\* \circ \mathsf{Rel}(B) \times \bigoplus\_{\mathsf{T}} \mathsf{Rel}\_X \xrightarrow{\mathsf{Rel}(P)\_X} \mathsf{Rel}\_{PX} \xrightarrow{\mathsf{L}} (\hat{\delta}\_X \circ Pf)^\* \circ \mathsf{Rel}(L)\_{PX} \tag{22}$$

with f : X → BX a B-coalgebra in this case. Here, we can obtain a backward step Rel(P)<sup>X</sup> ◦ <sup>f</sup> <sup>∗</sup> ◦ Rel(B)<sup>X</sup> <sup>≤</sup> (ˆδ<sup>X</sup> ◦ P f)<sup>∗</sup> ◦ Rel(L)PX ◦ Rel(P)<sup>X</sup> which means that we can lift Rel(P)<sup>X</sup> to bisimulations, so that these are preserved. However, we cannot obtain a forward step in this context, thus reflection will not hold. This is illustrated, e.g., by the example of ultrafilter extensions, where the ultrafilter monad β certainly does not reflect bisimulations: in general, in the ultrafilter extension more states will be bisimilar.

## 5 Applications

Now that we have obtained conditions for the preservation and reflection of bisimilarity, we return to the examples of Section 2.2. We will show how a number of existing non-trivial results can be recovered in a concise way. Further, the Set-Stone adjunction used in the first example is known to not be monadic, and so outside the scope of weak liftings, which indicates the generality of our results.

Ultrafilter Extensions and Vietoris bisimulations In Example 2.5, we have seen how the construction of ultrafilter extensions can be obtained from an invertible step, which arises from a weak lifting described by Garner. In the current treatment of reflection and preservation of bisimilarity, we focus on the restriction of this invertible step to the category Stone.

This brings us in line with [5], where a comparison is made between bisimilarity for the Vietoris functor V : Stone → Stone and bisimilarity for the powerset functor P : Set → Set, called Vietoris-bisimilarity and Kripke-bisimilarity respectively in op. cit. More precisely, for a <sup>V</sup>-coalgebra (X, f), Kripke bisimilarity is bisimilarity on U(X, f), where U is the step-induced lifting of the forgetful functor U : Stone → Set. Vietoris bisimilarity is simply bisimilarity on the coalgebra (X, f) itself.

We consider the following results from [5]:


From the above discussion, we see that these results fit into the setting of Section 4, so that they can be recovered using our results on the preservation and reflection of bisimilarity as follows:


Indeed, preservation and reflection by Rel(U)<sup>X</sup> follows from Theorem 4.10. We have seen that <sup>U</sup> : Stone <sup>→</sup> Set preserves abstract epis, so it only remains to check that Rel(U)<sup>X</sup> is order-reflecting. This holds because Stone is a reflective (i.e. full) subcategory of CHaus, which is monadic over Set.


PAs and Belief State Transformers As discussed in Example 2.6, we can determinise a PA to a coalgebra for the convex powerset functor P<sup>c</sup> : EM(D) → EM(D) using a lifting of F : Set → EM(D). The step-induced lifting of the corresponding forgetful functor <sup>U</sup> : EM(D) <sup>→</sup> Set maps the <sup>P</sup>c-coalgebra back into Set, but we must take care that this does not change its behaviour. What we can do now, is show that bisimilarity is preserved and reflected.

Once we know this, we can apply Lemma 4.13 to show the coincidence of bisimilarity and behavioural equivalence in the case of the convex powerset functor on EM(D) and the powerset functor on Set as this preserves weak pullbacks. This coincidence is relevant for the generalisation of the corresponding results of [12] (restricted to the convex powerset functor), which are formulated in terms of behavioural equivalence. As mentioned in Example 2.6, the weak lifting we require to cover automata with labels can be found in [21]. Consider the following:


Again, we can apply the results of Section 4 to recover these results. In fact, in [12, Proposition 6.5], the second result is proved more generally, namely for settings where a so-called *lax lifting* exists rather than the weak lifting we require.


Automata For a different instance, we revisit Example 2.7 and consider the basic adjunction <sup>P</sup> <sup>Q</sup>: <sup>D</sup>op → C. As a general remark, we note that if <sup>D</sup> admits a factorization system (E,M) with <sup>E</sup> a class of epis, and <sup>M</sup> a class of monos, then (M, <sup>E</sup>) forms a factorization system for <sup>D</sup>op, with <sup>M</sup> a class of epis *in* <sup>D</sup>op, and <sup>E</sup> a class of monos *in* <sup>D</sup>op. We can explicitly describe Rel(Dop) as follows:


$$\begin{array}{ccc} E \xleftarrow{\cdot - \cdot - \cdot - \cdot} F & & \\ \uparrow & & \uparrow & \\ X + X \xleftarrow{\cdot - \cdot} Y + Y & & \end{array} \tag{23}$$

In the case <sup>D</sup> <sup>=</sup> Set, <sup>E</sup> <sup>=</sup> Epi and <sup>M</sup> <sup>=</sup> Mono. Further, every epi <sup>e</sup>: <sup>X</sup> <sup>+</sup> <sup>X</sup> - E is isomorphic to an epi of the form X + X - (<sup>X</sup> <sup>+</sup> <sup>X</sup>)/<sup>∼</sup> with <sup>∼</sup> an equivalence relation on X + X. This gives us an equivalent description of Rel(Setop):


In particular, we see that the fibre over a set X consists of all equivalence relations on <sup>X</sup> <sup>+</sup> <sup>X</sup>, ordered by reverse inclusion. Reindexing along a map <sup>u</sup>: <sup>X</sup> <sup>←</sup> <sup>Y</sup> maps an equivalence relation <sup>≈</sup> on <sup>Y</sup> <sup>+</sup> <sup>Y</sup> to the least equivalence relation <sup>∼</sup> on <sup>X</sup> <sup>+</sup> <sup>X</sup>, such that <sup>j</sup>(u(y)) <sup>∼</sup> <sup>j</sup>- (u(y- )) for all <sup>j</sup>(y) <sup>≈</sup> <sup>j</sup>- (y- ).

Focusing on the setting of (8) in Example 2.7, the lifting Rel(L) is given by

$$\text{\textbullet in} (\*) \text{ \textbullet} (L) (\sim) \text{ \textbullet} (\*) \tag{24}$$

$$j((a,x))\ \text{Rel}(L)(\sim)\ j'((b,x)) \iff a=b \text{ and } j(x)\sim j'(y)\tag{25}$$

If <sup>f</sup> : <sup>X</sup> <sup>←</sup> 1 + <sup>Σ</sup> <sup>×</sup> <sup>X</sup> is an <sup>L</sup>-coalgebra, we see that <sup>f</sup> <sup>∗</sup> ◦ Rel(L)<sup>X</sup> maps an equivalence relation <sup>∼</sup> on <sup>X</sup> <sup>+</sup> <sup>X</sup> to the least equivalence relation <sup>≈</sup> satisfying

$$\text{inl}(f(\ast)) \approx \text{inr}(f(\ast))\tag{26}$$

$$j(f(a,x)) \approx j'(f(a,y)) \text{ whenever } j(x) \sim j'(y) \tag{27}$$

A post-fixed point of this map is an equivalence relation <sup>∼</sup> which relates inl(f(∗)) and inr(f(∗)) and is closed under the action of <sup>Σ</sup> on <sup>X</sup> <sup>+</sup> <sup>X</sup>. The greatest postfixed point is the least such relation, as relations in Rel<sup>X</sup> are ordered by reverse inclusion. It is easy to see that this is exactly the relation which identifies inl(x) and inr(x) for those <sup>x</sup> reachable from <sup>f</sup>(∗).

Rel(Q), meanwhile, maps an equivalence relation <sup>∼</sup> on <sup>X</sup> <sup>+</sup> <sup>X</sup> to the relation R on 2<sup>X</sup> given by

$$
u Rv \iff \text{in} \!| [u] \cup \text{in} \!|v\!\rangle \text{ is } \sim \text{closed} \tag{28}$$

If X is the set of reachable states, we conclude that Rel(Q) maps the greatest bisimulation ∼ to the relation

$$uRv \iff u \cap X' = v \cap X' \tag{29}$$

The functor Q preserves (abstract) epis, as all epis in Setop are regular. Now, Theorem 4.10 tells us that the relation (29) coincides with bisimilarity on the automaton Q(X, f) from Example 2.7. It follows that the subautomaton on 2X- is minimal, and is the minimal automaton equivalent to Q(X, f).

## 6 Discussion and Future Work

We studied the notion of an *invertible step*, which provides several constructions on coalgebras via functor liftings. We showed that the lifting of the right adjoint, induced by such an invertible step, preserves and reflects bisimilarity. This abstract result instantiates to several concrete results from the literature, in examples related to ultrafilter extensions and weak distributive laws.

We have focused on preservation and reflection of bisimilarity, defined in terms of relation lifting. There are several other coalgebraic notions of behavioural equivalence and bisimilarity [44]—we discuss these in the next subsection. Finally, in Section 6.2 we list directions for future work.

## 6.1 Remarks on other notions of bisimulation

Aczel-Mendler bisimulations For a coalgebra f : X → LX, an Aczel-Mendler bisimulation R - X × X is defined by the existence of an L-coalgebra structure R → LR on R such that the projection maps are coalgebra homomorphisms [1].

In the invertible step setting, applying a lifting Q to such a bisimulation, yields a structure QR → BQR. However, this is not immediately a bisimulation, as QR may not be a relation. We can obtain a relation by taking the image of Qπ1, Qπ2 as we do to define relation lifting, but in general this is a Hermida-Jacobs bisimulation [28, Exercise 4.5.2], rather than an Aczel-Mendler one.

On the other hand, if we wish to speak of reflection of Aczel-Mendler bisimulations, we start with a span QX ← R → QX and try to construct a relation on X. Using the adjunction of the step setting, we can transpose the projections to obtain a span X ← P R → X. Again P R is not immediately a relation in general, and taking the image yields a Rel(L)-coalgebra (not an L-coalgebra) as the projections and the counit ε are coalgebra homomorphisms (see also [28, Exercise 4.5.4]). This in fact comes down to the same as the left adjoint - <sup>ε</sup> ◦ Rel(P)QX constructed earlier. There we factorise to obtain the relation lifting and factorise again for the direct image of ε, instead of factorising the paired transposes defined using ε. We also do not explicitly use that ε is a coalgebra homomorphism (although this follows from the step with right inverse and Lemma 2.3); instead we lift the adjunction at the level of relations to give a map between bisimulations. This is part of the motivation for the use of relation liftings and the corresponding notion of bisimulations.

Going further, it is shown in [5] that there exists a Vietoris bisimulation which is not an Aczel-Mendler bisimulation and, stronger, that there exist Vietoris coalgebras with states which can be related by a Vietoris but not an Aczel-Mendler bisimulation. Thus, the correspondences between bisimulations on Set and Stone we have discussed in the previous sections are not obtainable when we consider Aczel-Mendler bisimulations.

Kernel bisimulations/behavioural equivalence In applying our results to the preservation and reflection of behavioural equivalence, we currently work concretely; considering sets of states and identification of elements.

We prefer to work more abstractly, as we have done for bisimilarity. To this end, we may consider kernel bisimulations. A relation R - X × X is a kernel bisimulation on a coalgebra (X, f : X → LX) in a category D, if it is the pullback of morphisms X → Z ← X in D forming a cospan of coalgebra homomorphisms (X, f) → (Z, z) ← (X, f) in CoalgD(L). In a concrete setting this coincides with behavioural equivalence, as such a pullback contains exactly the pairs of elements of X which are identified in Z by the morphisms forming the cospan. We can thus view this as a generalisation of behavioural equivalence as defined earlier.

Assuming an invertible step δ : BQ → QL, we would like to relate R to a kernel bisimulation on the coalgebra Q(X, f) obtained by applying the stepinduced lifting of Q. Applying Q to the pullback square for R yields a pullback square as Q is a right adjoint. However, as in our discussion of Aczel-Mendler bisimulations, this may not be a relation. We may try to also use relation liftings here, and take Rel(Q)(R) instead of Q(R), however this may no longer be a pullback. It is not currently clear to us how to resolve these problems in general.

#### 6.2 Future work

There are several further directions for future work. First, in this paper we focused primarily on fibrations of *relations*, which suffice for our purposes of studying bisimilarity. However, we expect that some of our results can be generalised to arbitrary (posetal) fibrations. Such a generalisation could be the basis to study preservation and reflection of other coinductive predicates and relations than bisimilarity, which can be formulated in terms of fibrations and liftings (e.g., [25]).

Secondly, while we have shown in Section 5 how our results can be used to recover the central results from [5], the latter have been generalised in two directions: the recent [24] considers bisimulations for Vietoris coalgebras on the category of *arbitrary* topological spaces, while [18] develops a notion of neighbourhood bisimulation for coalgebras that allows to generalise the results from [5] to a large variety of functors on the category of Stone spaces and their corresponding functors on Set. We would like to understand whether or not our framework is able to recover these generalisations.

Finally, the examples that we have studied in this paper do not yet exploit the full generality of invertible steps: our main motivating examples are based on an Eilenberg-Moore adjunction (or close, as in the example based on Stone spaces). In [41] it is shown that steps are relevant in a much wider setting, for instance when based on a Kleisli adjunction or on contravariant adjunctions and dualities. The latter type of steps are relevant for coalgebraic modal logics—we have studied a first instance in our example of deterministic and non-deterministic automata. Investigating the meaning of invertible steps in these other types of adjunctions is left for future work.

Acknowledgements This research has been partially funded by the NWO grant OCENW.M20.053 and by Leverhulme Trust Research Project Grant RPG-2020-232.

## References


348 R. Turkenburg et al.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Quantitative Safety and Liveness**

Thomas A. Henzinger, Nicolas Mazzocchi, and N. Ege Saraç()

Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria {tah,nmazzocc,esarac}@ist.ac.at

**Abstract.** Safety and liveness are elementary concepts of computation, and the foundation of many verification paradigms. The safety-liveness classification of boolean properties characterizes whether a given property can be falsified by observing a finite prefix of an infinite computation trace (always for safety, never for liveness). In quantitative specification and verification, properties assign not truth values, but quantitative values to infinite traces (e.g., a cost, or the distance to a boolean property). We introduce quantitative safety and liveness, and we prove that our definitions induce conservative quantitative generalizations of both (1) the safety-progress hierarchy of boolean properties and (2) the safety-liveness decomposition of boolean properties. In particular, we show that every quantitative property can be written as the pointwise minimum of a quantitative safety property and a quantitative liveness property. Consequently, like boolean properties, also quantitative properties can be min-decomposed into safety and liveness parts, or alternatively, maxdecomposed into co-safety and co-liveness parts. Moreover, quantitative properties can be approximated naturally. We prove that every quantitative property that has both safe and co-safe approximations can be monitored arbitrarily precisely by a monitor that uses only a finite number of states.

## **1 Introduction**

Safety and liveness are elementary concepts in the semantics of computation [39]. They can be explained through the thought experiment of a *ghost monitor*—an imaginary device that watches an infinite computation trace at runtime, one observation at a time, and always maintains the set of *possible prediction values* to reflect the satisfaction of a given property. Let *Φ* be a boolean property, meaning that *Φ* divides all infinite traces into those that satisfy *Φ*, and those that violate *Φ*. After any finite number of observations, True is a possible prediction value for *Φ* if the observations seen so far are consistent with an infinite trace that satisfies *Φ*, and False is a possible prediction value for *Φ* if the observations seen so far are consistent with an infinite trace that violates *Φ*. When True is no possible prediction value, the ghost monitor can reject the hypothesis that *Φ* is satisfied. The property *Φ* is *safe* if and only if the ghost monitor can always reject the hypothesis *Φ* after a finite number of observations: if the infinite trace that is being monitored violates *Φ*, then after some finite number of observations, True is no possible prediction value for *Φ*. Orthogonally, the property *Φ* is *live* if and only if the ghost monitor can never reject the hypothesis *Φ* after a finite number of

#### 350 T. A. Henzinger et al.

observations: for all infinite traces, after every finite number of observations, True remains a possible prediction value for *Φ*.

The safety-liveness classification of properties is fundamental in verification. In the natural topology on infinite traces—the "Cantor topology"—the safety properties are the closed sets, and the liveness properties are the dense sets [4]. For every property *Φ*, the location of *Φ* within the Borel hierarchy that is induced by the Cantor topology—the so-called "safety-progress hierarchy" [17] indicates the level of difficulty encountered when verifying *Φ*. On the first level, we find the safety and co-safety properties, the latter being the complements of safety properties, i.e., the properties whose falsehood (rather than truth) can always be rejected after a finite number of observations by the ghost monitor. More sophisticated verification techniques are needed for second-level properties, which are the countable boolean combinations of first-level properties—the socalled "response" and "persistence" properties [17]. Moreover, the orthogonality of safety and liveness leads to the following celebrated fact: *every* property can be written as the intersection of a safety property and a liveness property [4]. This means that every property *Φ* can be decomposed into two parts: a safety part which is amenable to simple verification techniques, such as invariants—and a liveness part—which requires heavier verification paradigms, such as ranking functions. Dually, there is always a disjunctive decomposition of *Φ* into co-safety and co-liveness.

So far, we have retold the well-known story of safety and liveness for *boolean* properties. A boolean property *Φ* is formalized mathematically as the *set* of infinite computation traces that satisfy *Φ*, or equivalently, the characteristic *function* that maps each infinite trace to a truth value. Quantitative generalizations of the boolean setting allow us to capture not only correctness properties, but also performance properties [31]. In this paper we reveal the story of safety and liveness for such *quantitative* properties, which are functions from infinite traces to an arbitrary set D of *values*. In order to compare values, we equip the value domain D with a partial order *<*, and we require (D*, <*) to be a complete lattice. The membership problem [18] for an infinite trace *f* and a quantitative property *<sup>Φ</sup>* asks whether *<sup>Φ</sup>*(*f*) <sup>≥</sup> *<sup>v</sup>* for a given threshold value *<sup>v</sup>* <sup>∈</sup> <sup>D</sup>. Correspondingly, in our thought experiment, the ghost monitor attempts to reject hypotheses of the form *Φ*(*f*) ≥ *v*, which cannot be rejected as long as all observations seen so far are consistent with an infinite trace *f* with *Φ*(*f*) ≥ *v*. We will define *Φ* to be a *quantitative safety* property if and only if every hypothesis of the form *Φ*(*f*) ≥ *v* can always be rejected by the ghost monitor after a finite number of observations, and we will define *Φ* to be a *quantitative liveness* property if and only if some hypothesis of the form *Φ*(*f*) ≥ *v* can never be rejected by the ghost monitor after any finite number of observations. We note that in the quantitative case, after every finite number of observations, the set of possible prediction values for *Φ* maintained by the ghost monitor may be finite or infinite, and in the latter case, it may not contain a minimal or maximal element.

Let us give a few examples. Suppose we have four observations: observation rq for "request a resource," observation gr for "grant the resource," observation tk for "clock tick," and observation oo for "other." The boolean property Resp requires that every occurrence of rq in an infinite trace is followed eventually by an occurrence of gr. The boolean property NoDoubleReq requires that no occurrence of rq is followed by another rq without some gr in between. The quantitative property MinRespTime maps every infinite trace to the largest number *<sup>k</sup>* such that there are at least *<sup>k</sup>* occurrences of tk between each rq and the closest subsequent gr. The quantitative property MaxRespTime maps every infinite trace to the smallest number *k* such that there are at most *k* occurrences of tk between each rq and the closest subsequent gr. The quantitative property AvgRespTime maps every infinite trace to the lower limit value lim inf of the infinite sequence (*vi*)*<sup>i</sup>*≥<sup>1</sup>, where *<sup>v</sup><sup>i</sup>* is, for the first *<sup>i</sup>* occurrences of tk, the average number of occurrences of tk between rq and the closest subsequent gr. Note that the values of AvgRespTime can be ∞ for some computations, including those for which the value of Resp is True. This highlights that boolean properties are not embedded in the limit behavior of quantitative properties.

The boolean property Resp is live because every finite observation sequence can be extended with an occurrence of gr. In fact, Resp is a second-level liveness property (namely, a response property), because it can be written as a countable intersection of co-safety properties. The boolean property NoDoubleReq is safe because if it is violated, it will be rejected by the ghost monitor after a finite number of observations, namely, as soon as the ghost monitor sees a rq followed by another occurrence of rq without an intervening gr. According to our quantitative generalization of safety, MinRespTime is a safety property. The ghost monitor always maintains the minimal number *<sup>k</sup>* of occurrences of tk between any past rq and the closest subsequent gr seen so far; the set of possible prediction values for MinRespTime is always {0*,* 1*,...,k*}. Every hypothesis of the form "the MinRespTime-value is at least *v*" is rejected by the ghost monitor as soon as *k<v*; if such a hypothesis is violated, this will happen after some finite number of observations. Symmetrically, the quantitative property MaxRespTime is co-safe, because every wrong hypothesis of the form "the MaxRespTime-value is at most *v*" will be rejected by the ghost monitor as soon as the smallest possible prediction value for MaxRespTime, which is the maximal number of occurrences of tk between any past rq and the closest subsequent gr seen so far, goes above *<sup>v</sup>*. By contrast, the quantitative property AvgRespTime is both live and co-live because no hypothesis of the form "the AvgRespTime-value is at least *v*," nor of the form "the AvgRespTime-value is at most *v*," can ever be rejected by the ghost monitor after a finite number of observations. All nonnegative real numbers and ∞ always remain possible prediction values for AvgRespTime. Note that a ghost monitor that attempts to reject hypotheses of the form *Φ*(*f*) ≥ *v* does not need to maintain the entire set of possible prediction values, but only the sup of the set of possible prediction values, and whether or not the sup is contained in the set. Dually, updating inf (and whether it is contained) suffices to reject hypotheses of the form *Φ*(*f*) ≤ *v*.

By defining quantitative safety and liveness via ghost monitors, we not only obtain a conservative and quantitative generalization of the boolean story, but also open up attractive frontiers for quantitative semantics, monitoring, and verification. For example, while the approximation of boolean properties reduces to adding and removing traces to and from a set, the approximation of quantitative properties offers a rich landscape of possibilities. In fact, we can approximate the notion of safety itself. Given an error bound *α*, the quantitative property *Φ* is *α-safe* if and only if for every value *v* and every infinite trace *f* whose value *Φ*(*f*) is less than *v*, all possible prediction values for *Φ* are less than *v* + *α* after some finite prefix of *f*. This means that, for an *α*-safe property *Φ*, the ghost monitor may not reject wrong hypotheses of the form *Φ*(*f*) ≥ *v* after a finite number of observations, once the violation is below the error bound. We show that every quantitative property that is both *α*-safe and *β*-co-safe, for any finite *α* and *β*, can be monitored arbitrarily precisely by a monitor that uses only a finite number of states.

We are not the first to define quantitative (or multi-valued) definitions of safety and liveness [41,27]. While the previously proposed quantitative generalizations of safety share strong similarities with our definition (without coinciding completely), our quantitative generalization of liveness is entirely new. The definitions of [27] do not support any safety-liveness decomposition, because their notion of safety is too permissive, and their liveness too restrictive. While the definitions of [41] admit a safety-liveness decomposition, our definition of liveness captures strictly fewer properties. Consequently, our definitions offer a stronger safety-liveness decomposition theorem. Our definitions also fit naturally with the definitions of emptiness, equivalence, and inclusion for quantitative languages [18].

**Overview.** In Section 2, we introduce quantitative properties. In Section 3, we define quantitative safety as well as safety closure, namely, the property that increases the value of each trace as little as possible to achieve safety. Then, we prove that our definitions preserve classical boolean facts. In particular, we show that a quantitative property *Φ* is safe if and only if *Φ* equals its safety closure if and only if *Φ* is upper semicontinuous. In Section 4, we generalize the safetyprogress hierarchy to quantitative properties. We first define limit properties. For ∈ {inf*,*sup*,* lim inf*,* lim sup}, the class of -properties captures those for which the value of each infinite trace can be derived by applying the limit function to the infinite sequence of values of finite prefixes. We prove that inf-properties coincide with safety, sup-properties with co-safety, lim inf-properties are suprema of countably many safety properties, and lim sup-properties infima of countably many co-safety properties. The lim inf-properties generalize the boolean persistence properties of [17]; the lim sup-properties generalize their response properties. For example, AvgRespTime is a lim inf-property. In Section 5, we introduce quantitative liveness and co-liveness. We prove that our definitions preserve the classical boolean facts, and show that there is a unique property which is both safe and live. As main result, we provide a safety-liveness decomposition that holds for every quantitative property. In Section 6, we define approximate safety and co-safety. We generalize the well-known unfolding approximation of discounted properties for approximate safety and co-safety properties over the extended reals. This allows us to provide a finite-state approximate monitor for these properties. In Section 7, we conclude with future research directions. For complete proofs of all results, we refer the reader to the full version of the paper. **Related Work.** The notions of safety and liveness for boolean properties appeared first in [39] and were later formalized in [4], where safety properties were characterized as closed sets of the Cantor topology on infinite traces, and liveness properties as dense sets. As a consequence, the seminal decomposition theorem followed: every boolean property is an intersection of a safety property and a liveness property. A benefit of such a decomposition lies in the difference between the mathematical arguments used in their verification. While safety properties enable simpler methods such as invariants, liveness properties require more complex approaches such as well-foundedness [42,5]. These classes were characterized in terms of Büchi automata in [5] and in terms of linear temporal logic in [46].

The safety-progress classification of boolean properties [17] proposes an orthogonal view: rather than partitioning the set of properties, it provides a hierarchy of properties starting from safety. This yields a more fine-grained view of nonsafety properties which distinguishes whether a "good thing" happens at least once (co-safety or "guarantee"), infinitely many times (response), or eventually always (persistence). This classification follows the Borel hierarchy that is induced by the Cantor topology on infinite traces, and has corresponding projections within properties that are definable by finite automata and by formulas of linear temporal logic.

Runtime verification, or monitoring, is a lightweight, dynamic verification technique [6], where a monitor watches a system during its execution and tries to decide, after each finite sequence of observations, whether the observed finite computation trace or its unknown infinite extension satisfies a desired property. The safety-liveness dichotomy has profound implications for runtime verification as well: safety is easy to monitor [28], while liveness is not. An early definition of boolean monitorability was equivalent to safety with recursively enumerable sets of bad prefixes [35]. The monitoring of infinite-state boolean safety properties was later studied in [26]. A more popular definition of boolean monitorability [44,8] accounts for both truth and falsehood, establishing the set of monitorable properties as a strict superset of finite boolean combinations of safety and co-safety [23]. Boolean monitors that use the set possible prediction values can be found in [7]. The notion of boolean monitorability was investigated through the safety-liveness lens in [43] and through the safety-progress lens in [23].

Quantitative properties (a.k.a. "quantitative languages") [18] extend their boolean counterparts by moving from the two-valued truth domain to richer domains such as real numbers. Such properties have been extensively studied from a static verification perspective in the past decade, e.g., in the context of model-checking probabilistic properties [38,37], games with quantitative objectives [10,15], specifying quantitative properties [11,1], measuring distances between systems [2,16,22,29], best-effort synthesis and repair [9,20], and quantitative analysis of transition systems [47,14,21,19]. More recently, quantitative properties have been also studied from a runtime verification perspective, e.g., for limit monitoring of statistical indicators of infinite traces [25] and for analyzing resource-precision trade-offs in the design of quantitative monitors [33,30].

To the best of our knowledge, previous definitions of (approximate) safety and liveness in nonboolean domains make implicit assumptions about the specification language [48,34,24,45]. We identify two notable exceptions. In [27], the authors generalize the framework of [43] to nonboolean value domains. They provide neither a safety-liveness decomposition of quantitative properties, nor a fine-grained classification of nonsafety properties. In [41], the authors present a safety-liveness decomposition and some levels of the safety-progress hierarchy on multi-valued truth domains, which are bounded distributive lattices. Their motivation is to provide algorithms for model-checking properties on multi-valued truth domains. We present the relationships between their definitions and ours in the relevant sections below.

## **2 Quantitative Properties**

Let *Σ* <sup>=</sup> {*a, b, . . .*} be a finite alphabet of observations. A *trace* is an infinite sequence of observations, denoted by *f, g, h* <sup>∈</sup> *Σ<sup>ω</sup>*, and a *finite trace* is a finite sequence of observations, denoted by *s, r, t* <sup>∈</sup> *Σ*<sup>∗</sup>. Given *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup> and *<sup>w</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>∪*Σ<sup>ω</sup>*, we denote by *s* <sup>≺</sup> *w* (resp. *s w*) that *s* is a strict (resp. nonstrict) prefix of *w*. Furthermore, we denote by <sup>|</sup>*w*<sup>|</sup> the length of *<sup>w</sup>* and, given *<sup>a</sup>* <sup>∈</sup> *<sup>Σ</sup>*, by <sup>|</sup>*w*|*<sup>a</sup>* the number of occurrences of *a* in *w*.

A *value domain* D is a poset. Unless otherwise stated, we assume that D is a nontrivial (i.e., ⊥ = ) complete lattice and, whenever appropriate, we write <sup>0</sup>*,* <sup>1</sup>*,* −∞*,* <sup>∞</sup> instead of <sup>⊥</sup> and for the least and the greatest elements. We respectively use the terms minimum and maximum for the greatest lower bound and the least upper bound of finitely many elements.

**Definition 1 (Property).** *A* quantitative property *(or simply* property*) is a function Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup> *from the set of all traces to a value domain.*

A boolean property *<sup>P</sup>* <sup>⊆</sup> *<sup>Σ</sup><sup>ω</sup>* is defined as a set of traces. We use the boolean domain <sup>B</sup> <sup>=</sup> {0*,* <sup>1</sup>} with <sup>0</sup> *<* <sup>1</sup> and, in place of *P*, its *characteristic property <sup>Φ</sup><sup>P</sup>* : *<sup>Σ</sup><sup>ω</sup>* <sup>→</sup> <sup>B</sup>, which is defined by *<sup>Φ</sup><sup>P</sup>* (*f*)=1 if *<sup>f</sup>* <sup>∈</sup> *<sup>P</sup>*, and *<sup>Φ</sup><sup>P</sup>* (*f*)=0 if *f /*<sup>∈</sup> *<sup>P</sup>*.

For all properties *<sup>Φ</sup>*<sup>1</sup>*, Φ*<sup>2</sup> on a domain <sup>D</sup> and all traces *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*, we let min(*Φ*<sup>1</sup>*, Φ*<sup>2</sup>)(*f*) = min(*Φ*<sup>1</sup>(*f*)*, Φ*<sup>2</sup>(*f*)) and max(*Φ*<sup>1</sup>*, Φ*<sup>2</sup>)(*f*) = max(*Φ*<sup>1</sup>(*f*)*, Φ*<sup>2</sup>(*f*)). For a domain D, the *inverse* of D is the domain D that contains the same elements as <sup>D</sup> but with the ordering reversed. For a property *Φ*, we define its *complement <sup>Φ</sup>* : *<sup>Σ</sup><sup>ω</sup>* <sup>→</sup> <sup>D</sup> by *<sup>Φ</sup>*(*f*) = *<sup>Φ</sup>*(*f*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*.

Some properties can be defined as limits of value sequences. A *finitary property π* : *Σ*<sup>∗</sup> <sup>→</sup> <sup>D</sup> associates a value with each finite trace. A *value function* : <sup>D</sup>*<sup>ω</sup>* <sup>→</sup> <sup>D</sup> condenses an infinite sequence of values to a single value. Given a finitary property *π*, a value function , and a trace *f* <sup>∈</sup> *Σ<sup>ω</sup>*, we write *<sup>s</sup>*≺*<sup>f</sup>π*(*s*) instead of (*π*(*s*<sup>0</sup>)*π*(*s*<sup>1</sup>)*...*), where each *<sup>s</sup><sup>i</sup>* fulfills *<sup>s</sup><sup>i</sup>* <sup>≺</sup> *<sup>f</sup>* and <sup>|</sup>*s<sup>i</sup>*<sup>|</sup> <sup>=</sup> *<sup>i</sup>*.

## **3 Quantitative Safety**

Given a property *Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup>, a trace *f* <sup>∈</sup> *Σ<sup>ω</sup>*, and a value *v* <sup>∈</sup> <sup>D</sup>, the quantitative membership problem [18] asks whether *Φ*(*f*) <sup>≥</sup> *v*. We define quantitative safety as follows: the property *Φ* is safe iff every wrong hypothesis of the form *Φ*(*f*) <sup>≥</sup> *v* has a finite witness *s* <sup>≺</sup> *f*.

**Definition 2 (Safety).** *A property Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is* safe *iff for every f* <sup>∈</sup> *Σ<sup>ω</sup> and value <sup>v</sup>* <sup>∈</sup> <sup>D</sup> *with <sup>Φ</sup>*(*f*) ≥ *<sup>v</sup>, there is a prefix <sup>s</sup>* <sup>≺</sup> *<sup>f</sup> such that* sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) ≥ *<sup>v</sup>.*

Let us illustrate this definition with the *minimal response-time* property.

*Example 3.* Let *Σ* <sup>=</sup> {rq*,* gr*,* tk*,* oo} and <sup>D</sup> <sup>=</sup> <sup>N</sup> ∪ {∞}. We define the minimal response-time property *<sup>Φ</sup>*min through an auxiliary finitary property *<sup>π</sup>*min that computes the minimum response time so far. In a finite or infinite trace, an occurrence of rq is *granted* if it is followed, later, by a gr, and otherwise it is *pending*. Let *π*last(*s*) = <sup>∞</sup> if the finite trace *<sup>s</sup>* contains a pending rq, or no rq, and *<sup>π</sup>*last(*s*) = <sup>|</sup>*r*|tk − |*t*|tk otherwise, where *<sup>r</sup>* <sup>≺</sup> *<sup>s</sup>* is the longest prefix of *s* with a pending rq, and *t* <sup>≺</sup> *r* is the longest prefix of *r* without pending rq. Intuitively, *<sup>π</sup>*last provides the response time for the last request when all requests are granted, and <sup>∞</sup> when there is a pending request or no request. Given *s* <sup>∈</sup> *Σ*<sup>∗</sup>, taking the minimum of the values of *<sup>π</sup>*last over the prefixes *<sup>r</sup> <sup>s</sup>* gives us the minimum response time so far. Let *<sup>π</sup>*min(*s*) = min*rs <sup>π</sup>*last(*r*) for all *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>, and *<sup>Φ</sup>*min(*f*) = lim*s*≺*f <sup>π</sup>*min(*s*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. The limit always exists because the minimum is monotonically decreasing.

The minimal response-time property is safe. Let *f* <sup>∈</sup> *Σ<sup>ω</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>D</sup> such that *Φ*min(*f*) *< v*. Then, some prefix *<sup>s</sup>* <sup>≺</sup> *<sup>f</sup>* contains a rq that is granted after *u<v* ticks, in which case, no matter what happens in the future, the minimal response time is guaranteed to be at most *<sup>u</sup>*; that is, sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*min(*sg*) <sup>≤</sup> *u<v*. If you recall from the introduction the ghost monitor that maintains the sup of possible prediction values for the minimal response-time property, that value is always *<sup>π</sup>*min; that is, sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*min(*sg*) = *<sup>π</sup>*min(*s*) for all *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>. Note that in the case of minimal response time, the sup of possible prediction values is always realizable; that is, for all *s* <sup>∈</sup> *Σ*<sup>∗</sup>, there exists an *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>* such that sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*min(*sg*) = *<sup>Φ</sup>*min(*sf*). 

*Remark 4.* Quantitative safety generalizes boolean safety. For every boolean property *P* <sup>⊆</sup> *Σ<sup>ω</sup>*, the following statements are equivalent: (i) *<sup>P</sup>* is safe according to the classical definition [4], (ii) its characteristic property *<sup>Φ</sup>P* is safe, and (iii) for every *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>B</sup> with *<sup>Φ</sup>P* (*f*) *< v*, there exists a prefix *<sup>s</sup>* <sup>≺</sup> *<sup>f</sup>* such that for all *<sup>g</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*, we have *<sup>Φ</sup>P* (*sg*) *< v*.

We now generalize the notion of safety closure and present an operation that makes a property safe by increasing the value of each trace as little as possible.

**Definition 5 (Safety closure).** *The* safety closure *of a property Φ is the property <sup>Φ</sup>*<sup>∗</sup> *defined by <sup>Φ</sup>*<sup>∗</sup>(*f*) = inf*s*≺*<sup>f</sup>* sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>.*

We can say the following about the safety closure operation.

**Proposition 6.** *For every property Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup>*, the following statements hold.*


356 T. A. Henzinger et al.

## **3.1 Alternative Characterizations of Quantitative Safety**

Consider a trace and its prefixes of increasing length. For a given property, the ghost monitor from the introduction maintains, for each prefix, the sup of possible prediction values, i.e., the least upper bound of the property values for all possible infinite continuations. The resulting sequence of monotonically decreasing suprema provides an upper bound on the eventual property value. Moreover, for some properties, this sequence always converges to the property value. If this is the case, then the ghost monitor can always dismiss wrong lower-bound hypotheses after finite prefixes, and vice versa. This gives us an alternative definition for the safety of quantitative properties which, inspired by the notion of Scott continuity, was called *continuity* [33]. We now believe that *upper semicontinuity* is a more appropriate term, as becomes clear when we consider the Cantor topology on *Σ<sup>ω</sup>* and the value domain <sup>R</sup> ∪ {−∞*,* <sup>+</sup>∞}.

**Definition 7 (Upper semicontinuity [33]).** *A property Φ is* upper semicontinuous *iff <sup>Φ</sup>*(*f*) = lim*s*≺*<sup>f</sup>* sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>.*

We note that the minimal response-time property is upper semicontinuous.

*Example 8.* Recall the minimal response-time property *<sup>Φ</sup>*min from Example 3. For every trace *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*, the *<sup>Φ</sup>*min value is the limit of the *<sup>π</sup>*min values for the prefixes of *<sup>f</sup>*. Therefore, *<sup>Φ</sup>*min is upper semicontinuous.

In general, a property is safe iff it maps every trace to the limit of the suprema of possible prediction values. Moreover, we can also characterize safety properties as the properties that are equal to their safety closure.

**Theorem 9.** *For every property Φ, the following statements are equivalent: 1. Φ is safe. 2. Φ is upper semicontinuous. 3. Φ*(*f*) = *Φ*<sup>∗</sup>(*f*) *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>.*

## **3.2 Related Definitions of Quantitative Safety**

In [41], the authors consider the model-checking problem for properties on multivalued truth domains. They introduce the notion of multi-safety through a closure operation that coincides with our safety closure. Formally, a property *Φ* is *multi-safe* iff *Φ*(*f*) = *Φ*<sup>∗</sup>(*f*) for every *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. It is easy to see the following.

**Proposition 10.** *For every property Φ, we have Φ is multi-safe iff Φ is safe.*

Although the two definitions of safety are equivalent, our definition is consistent with the membership problem for quantitative automata and motivated by the monitoring of quantitative properties.

In [27], the authors extend a refinement of the safety-liveness classification for monitoring [43] to richer domains. They introduce the notion of verdict-safety through dismissibility of values not less than or equal to the property value. Formally, a property *Φ* is *verdict-safe* iff for every *f* <sup>∈</sup> *Σ<sup>ω</sup>* and *<sup>v</sup>* ≤ *<sup>Φ</sup>*(*f*), there exists a prefix *s* <sup>≺</sup> *f* such that for all *g* <sup>∈</sup> *Σ<sup>ω</sup>*, we have *<sup>Φ</sup>*(*sg*) <sup>=</sup> *<sup>v</sup>*.

We demonstrate that verdict-safety is weaker than safety. Moreover, we provide a condition under which the two definitions coincide. To achieve this, we reason about sets of possible prediction values: for a property *Φ* and *s* <sup>∈</sup> *Σ*<sup>∗</sup>, let *<sup>P</sup>Φ,s* <sup>=</sup> {*Φ*(*sf*) <sup>|</sup> *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*}.

**Lemma 11.** *A property <sup>Φ</sup> is verdict-safe iff <sup>Φ</sup>*(*f*) = sup(lim*s*≺*f <sup>P</sup>Φ,s*) *for all f* <sup>∈</sup> *Σ<sup>ω</sup>.*

Notice that *<sup>Φ</sup>* is safe iff *<sup>Φ</sup>*(*f*) = lim*s*≺*f* (sup *<sup>P</sup>Φ,s*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. Below we describe a property that is verdict-safe but not safe.

*Example 12.* Let *Σ* <sup>=</sup> {*a, b*}. Define *Φ* by *Φ*(*f*)=0 if *f* <sup>=</sup> *a<sup>ω</sup>*, and *<sup>Φ</sup>*(*f*) = <sup>|</sup>*s*<sup>|</sup> otherwise, where *s* <sup>≺</sup> *f* is the shortest prefix in which *b* occurs. The property *Φ* is verdict-safe. First, observe that <sup>D</sup> <sup>=</sup> <sup>N</sup> ∪ {∞}. Let *f* <sup>∈</sup> *Σ<sup>ω</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>D</sup> with *v>Φ*(*f*). If *Φ*(*f*) *>* <sup>0</sup>, then *f* contains *b*, and *Φ*(*f*) = <sup>|</sup>*s*<sup>|</sup> for some *s* <sup>≺</sup> *f* in which *b* occurs for the first time. After the prefix *s*, all *g* <sup>∈</sup> *Σ<sup>ω</sup>* yield *<sup>Φ</sup>*(*sg*) = <sup>|</sup>*s*|, thus all values above <sup>|</sup>*s*<sup>|</sup> are rejected. If *Φ*(*f*)=0, then *f* <sup>=</sup> *a<sup>ω</sup>*. Let *<sup>v</sup>* <sup>∈</sup> <sup>D</sup> with *v >* <sup>0</sup>, and consider the prefix *a<sup>v</sup>* <sup>≺</sup> *<sup>f</sup>*. Observe that the set of possible prediction values after reading *a<sup>v</sup>* is {0*, v* + 1*, v* + 2*,...*}, therefore *<sup>a</sup><sup>v</sup>* allows the ghost monitor to reject the value *v*. However, *Φ* is not safe because, although *Φ*(*a<sup>ω</sup>*)=0, for every *<sup>s</sup>* <sup>≺</sup> *<sup>a</sup><sup>ω</sup>*, we have sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) = <sup>∞</sup>.

The separation is due to the fact that, for some finite traces, the sup of possible prediction values cannot be realized by any future. Below, we present a condition that prevents such cases.

**Definition 13 (Supremum closedness).** *A property Φ is* sup-closed *iff for every <sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup> *we have* sup *<sup>P</sup>Φ,s* <sup>∈</sup> *<sup>P</sup>Φ,s.*

We remark that the minimal response-time property is sup-closed.

*Example 14.* The safety property minimal response-time *<sup>Φ</sup>*min from Example <sup>3</sup> is sup-closed. This is because, for every *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>, the continuation gr*<sup>ω</sup>* realizes the value sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*).

Recall from the introduction the ghost monitor that maintains the sup of possible prediction values. For monitoring sup-closed properties this suffices; otherwise the ghost monitor also needs to maintain whether or not the supremum of the possible prediction values is realizable by some future continuation. In general, we have the following for every sup-closed property.

**Lemma 15.** *For every* sup*-closed property Φ and for all f* <sup>∈</sup> *Σ<sup>ω</sup>, we have* lim*s*≺*f* (sup *<sup>P</sup>Φ,s*) = sup(lim*s*≺*f <sup>P</sup>Φ,s*)*.*

As a consequence of the lemmas above, we get the following.

**Theorem 16.** *<sup>A</sup>* sup*-closed property Φ is safe iff Φ is verdict-safe.*

## **4 The Quantitative Safety-Progress Hierarchy**

Our quantitative extension of safety closure allows us to build a Borel hierarchy, which is a quantitative extension of the boolean safety-progress hierarchy [17]. First, we show that safety properties are closed under pairwise min and max.

**Proposition 17.** *For every value domain* D*, the set of safety properties over* D *is closed under* min *and* max*.*

358 T. A. Henzinger et al.

The boolean safety-progress classification of properties is a Borel hierarchy built from the Cantor topology of traces. Safety and co-safety properties lie on the first level, respectively corresponding to the closed sets and open sets of the topology. The second level is obtained through countable unions and intersections of properties from the first level: persistence properties are countable unions of closed sets, while response properties are countable intersections of open sets. We generalize this construction to the quantitative setting.

In the boolean case, each property class is defined through an operation that takes a set *<sup>S</sup>* <sup>⊆</sup> *<sup>Σ</sup>*<sup>∗</sup> of finite traces and produces a set *<sup>P</sup>* <sup>⊆</sup> *<sup>Σ</sup><sup>ω</sup>* of infinite traces. For example, to obtain a co-safety property from *S* ⊆ *Σ*∗, the corresponding operation yields *SΣ<sup>ω</sup>*. Similarly, we formalize each property class by a value function. For this, we define the notion of *limit property*.

**Definition 18 (Limit property).** *A property <sup>Φ</sup>* : *<sup>Σ</sup><sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is a* limit property *iff there exists a finitary property <sup>π</sup>* : *<sup>Σ</sup>*<sup>∗</sup> <sup>→</sup> <sup>D</sup> *and a value function* : <sup>D</sup>*<sup>ω</sup>* <sup>→</sup> <sup>D</sup> *such that <sup>Φ</sup>*(*f*) = *<sup>s</sup>*≺*<sup>f</sup>π*(*s*) *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>. We denote this by Φ* = (*π,* )*, and write Φ*(*s*) *instead of π*(*s*)*. In particular, if Φ* = (*π,* )*, where* ∈ {inf*,*sup*,* lim inf*,* lim sup}*, then <sup>Φ</sup> is an* -property*.*

To account for the value functions that construct the first two levels of the safety-progress hierarchy, we start our investigation with inf- and sup-properties and later focus on lim inf- and lim sup- properties [18].

#### **4.1 Infimum and Supremum Properties**

Let us start with an example by demonstrating that the minimal response-time property is an inf-property.

*Example 19.* Recall the safety property *Φ*min of minimal response time from Example 3. We can equivalently define *Φ*min as a limit property by taking the finitary property *π*last and the value function inf. As discussed in Example 3, the function *π*last outputs the response time for the last request when all requests are granted, and ∞ when there is a pending request or no request. Then inf*<sup>s</sup>*≺*<sup>f</sup> <sup>π</sup>*last(*s*) = *<sup>Φ</sup>*min(*f*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*, and therefore *<sup>Φ</sup>*min = (*π*last*,* inf).

In fact, the safety properties coincide with inf-properties.

**Theorem 20.** *A property Φ is safe iff Φ is an* inf*-property.*

Defining the minimal response-time property as a limit property, we observe the following relation between its behavior on finite traces and infinite traces.

*Example 21.* Consider the property *Φ*min = (*π*last*,* inf) from Example 19. Let *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>D</sup>. Observe that if the minimal response time of *<sup>f</sup>* is at least *<sup>v</sup>*, then the last response time for each prefix *s* ≺ *f* is also at least *v*. Conversely, if the minimal response time of *f* is below *v*, then there is a prefix *s* ≺ *f* for which the last response time is also below *v*.

In light of this observation, we provide another characterization of safety properties, explicitly relating the specified behavior of the limit property on finite and infinite traces.

**Theorem 22.** *A property Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is safe iff Φ is a limit property such that for every f* <sup>∈</sup> *Σ<sup>ω</sup> and value <sup>v</sup>* <sup>∈</sup> <sup>D</sup>*, we have <sup>Φ</sup>*(*f*) <sup>≥</sup> *<sup>v</sup> iff <sup>Φ</sup>*(*s*) <sup>≥</sup> *<sup>v</sup> for all <sup>s</sup>* <sup>≺</sup> *<sup>f</sup>.*

Recall that a safety property allows rejecting wrong lower-bound hypotheses with a finite witness, by assigning a tight upper bound to each trace. We define co-safety properties symmetrically: a property *Φ* is co-safe iff every wrong hypothesis of the form *Φ*(*f*) <sup>≤</sup> *v* has a finite witness *s* <sup>≺</sup> *f*.

**Definition 23 (Co-safety).** *A property <sup>Φ</sup>* : *<sup>Σ</sup><sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is* co-safe *iff for every f* <sup>∈</sup> *Σ<sup>ω</sup> and value <sup>v</sup>* <sup>∈</sup> <sup>D</sup> *with <sup>Φ</sup>*(*f*) ≤ *<sup>v</sup>, there exists a prefix <sup>s</sup>* <sup>≺</sup> *<sup>f</sup> such that* inf*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) ≤ *<sup>v</sup>.*

We note that our definition generalizes boolean co-safety, and thus a dual of Remark 4 holds also for co-safety. Moreover, we analogously define the notions of co-safety closure and lower semicontinuity.

**Definition 24 (Co-safety closure).** *The* co-safety closure *of a property Φ is the property <sup>Φ</sup>*<sup>∗</sup>(*f*) *defined by <sup>Φ</sup>*<sup>∗</sup>(*f*) = sup*s*≺*f* inf*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>.*

**Definition 25 (Lower semicontinuity [33]).** *A property Φ is* lower semicontinuous *iff <sup>Φ</sup>*(*f*) = lim*s*≺*f* inf*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>.*

Now, we define and investigate the *maximal response-time* property. In particular, we show that it is a sup-property that is co-safe and lower semicontinuous.

*Example 26.* Let *Σ* <sup>=</sup> {rq*,* gr*,* tk*,* oo} and <sup>D</sup> <sup>=</sup> <sup>N</sup>∪ {∞}. We define the maximal response-time property *<sup>Φ</sup>*max through a finitary property that computes the current response time for each finite trace and the value function sup. In particular, for all *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>, let *<sup>π</sup>*curr(*s*) = <sup>|</sup>*s*|tk − |*r*|tk, where *<sup>r</sup> s* is the longest prefix of *s* without pending rq; then *<sup>Φ</sup>*max = (*π*curr*,*sup). Note the contrast between *<sup>π</sup>*curr and *<sup>π</sup>*last from Example 3. While *<sup>π</sup>*curr takes an optimistic view of the future and assumes the gr will follow immediately, *<sup>π</sup>*last takes a pessimistic view and assumes the gr will never follow. Let *f* <sup>∈</sup> *Σ<sup>ω</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>D</sup>. If the maximal response time of *f* is greater than *v*, then for some prefix *s* <sup>≺</sup> *f* the current response time is greater than *v* also, which means that, no matter what happens in the future, the maximal response time is greater than *<sup>v</sup>* after observing *<sup>s</sup>*. Therefore, *<sup>Φ</sup>*max is co-safe. By a similar reasoning, the sequence of greatest lower bounds of possible prediction values over the prefixes converges to the property value. In other words, we have lim*s*≺*f* inf*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*max(*sg*) = *<sup>Φ</sup>*max(*f*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. Thus *<sup>Φ</sup>*max is also lower semicontinuous, and it equals its co-safety closure. Now, consider the complementary property *<sup>Φ</sup>*max, which maps every trace to the same value as *<sup>Φ</sup>*max on a domain where the order is reversed. It is easy to see that *<sup>Φ</sup>*max is safe. Finally, recall the ghost monitor from the introduction, which maintains the infimum of possible prediction values for the maximal response-time property. Since the maximal response-time property is inf-closed, the output of the ghost monitor after every prefix is realizable by some future continuation, and that output is *<sup>π</sup>*max(*s*) = max*rs <sup>π</sup>*curr(*r*) for all *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>. 

Generalizing the observations in the example above, we obtain the following characterizations due to the duality between safety and co-safety.

**Theorem 27.** *For every property <sup>Φ</sup>* : *<sup>Σ</sup><sup>ω</sup>* <sup>→</sup> <sup>D</sup>*, the following are equivalent.*


#### **4.2 Limit Inferior and Limit Superior Properties**

Let us start with an observation on the minimal response-time property.

*Example 28.* Recall once again the minimal response-time property *Φ*min from Example 3. In the previous subsection, we presented an alternative definition of *Φ*min to establish that it is an inf-property. Observe that there is yet another equivalent definition of *Φ*min which takes the monotonically decreasing finitary property *π*min from Example 3 and pairs it with either the value function lim inf, or with lim sup. Hence *Φ*min is both a lim inf- and a lim sup-property.

Before moving on to investigating lim inf- and lim sup-properties more closely, we show that the above observation can be generalized.

**Theorem 29.** *Every -property Φ, for* ∈ {inf*,*sup}*, is both a* lim inf*- and a* lim sup*-property.*

An interesting response-time property beyond safety and co-safety arises when we remove extreme values: instead of minimal response time, consider the property that maps every trace to a value that bounds from below, not all response times, but all of them from a point onward (i.e., all but finitely many). We call this property *tail-minimal response time*.

*Example 30.* Let *<sup>Σ</sup>* <sup>=</sup> {rq*,* gr*,* tk*,* oo} and *<sup>π</sup>*last be the finitary property from Example 3 that computes the last response time. We define the tail-minimal response-time property as *Φ*tmin = (*π*last*,* lim inf). Intuitively, it maps each trace to the least response time over all but finitely many requests. This property is interesting as a performance measure, because it focuses on the long-term performance by ignoring finitely many outliers. Consider *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>D</sup>. Observe that, if the tail-minimal response time of *f* is at least *v*, then there is a prefix *s* ≺ *f* such that for all longer prefixes *s r* ≺ *f*, the last response time in *r* is at least *v*, and vice versa.

Similarly as for inf-properties, we characterize lim inf-properties through a relation between property behaviors on finite and infinite traces.

**Theorem 31.** *A property <sup>Φ</sup>* : *<sup>Σ</sup><sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is a* lim inf*-property iff <sup>Φ</sup> is a limit property such that for every <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup> and value <sup>v</sup>* <sup>∈</sup> <sup>D</sup>*, we have <sup>Φ</sup>*(*f*) <sup>≥</sup> *<sup>v</sup> iff there exists s* ≺ *f such that for all s r* ≺ *f, we have Φ*(*r*) ≥ *v.*

Now, we show that the tail-minimal response-time property can be expressed as a countable supremum of inf-properties.

*Example 32.* Let *<sup>i</sup>* <sup>∈</sup> <sup>N</sup> and define *<sup>π</sup>i,*last as a finitary property that imitates *<sup>π</sup>*last from Example 3, but ignores the first *<sup>i</sup>* observations of every finite trace. Formally, for *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>, we define *<sup>π</sup>i,*last(*s*) = *<sup>π</sup>*last(*r*) for *<sup>s</sup>* <sup>=</sup> *<sup>s</sup>i<sup>r</sup>* where *<sup>s</sup>i <sup>s</sup>* with <sup>|</sup>*si*<sup>|</sup> <sup>=</sup> *<sup>i</sup>*, and *<sup>r</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>. Observe that an equivalent way to define *<sup>Φ</sup>*tmin from Example 30 is sup*i*∈<sup>N</sup>(inf*s*≺*<sup>f</sup>* (*πi,*last(*s*))) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. Intuitively, for each *i* <sup>∈</sup> <sup>N</sup>, we obtain an inf-property that computes the minimal response time of the suffixes of a given trace. Taking the supremum over these, we obtain the greatest lower bound on all but finitely many response times.

We generalize this observation and show that every lim inf-property is a countable supremum of inf-properties.

**Theorem 33.** *Every* lim inf*-property is a countable supremum of* inf*-properties.*

We would also like to have the converse of Theorem 33, i.e., that every countable supremum of inf-properties is a lim inf-property. Currently, we are able to show only the following.

**Theorem 34.** *For every infinite sequence* (*Φi*)*i*∈<sup>N</sup> *of* inf*-properties, there is a* lim inf*-property <sup>Φ</sup> such that* sup*i*∈<sup>N</sup> *<sup>Φ</sup>i*(*f*) <sup>≤</sup> *<sup>Φ</sup>*(*f*)*.*

We conjecture that some lim inf-property that satisfies Theorem 34 is also a lower bound on the countable supremum that occurs in the theorem. This, together with Theorem 34, would imply the converse of Theorem 33. Proving the converse of Theorem 33 would give us, thanks to the following duality, that the lim inf- and lim sup-properties characterize the second level of the Borel hierarchy of the topology induced by the safety closure operator.

**Proposition 35.** *A property Φ is a* lim inf*-property iff its complement <sup>Φ</sup> is a* lim sup*-property.*

## **5 Quantitative Liveness**

Similarly as for safety, we take the perspective of the quantitative membership problem to define liveness: a property *Φ* is live iff, whenever a property value is less than , there exists a value *v* for which the wrong hypothesis *Φ*(*f*) <sup>≥</sup> *v* can never be dismissed by any finite witness *s* <sup>≺</sup> *f*.

**Definition 36 (Liveness).** *A property Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is* live *iff for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>, if Φ*(*f*) *< , then there exists a value v* <sup>∈</sup> <sup>D</sup> *such that Φ*(*f*) <sup>≥</sup> *v and for all prefixes <sup>s</sup>* <sup>≺</sup> *<sup>f</sup>, we have* sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) <sup>≥</sup> *<sup>v</sup>.*

An equivalent definition can be given through the safety closure.

**Theorem 37.** *A property Φ is live iff Φ*<sup>∗</sup>(*f*) *> Φ*(*f*) *for every <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup> with Φ*(*f*) *< .*

Our definition generalizes boolean liveness. A boolean property *P* <sup>⊆</sup> *Σ<sup>ω</sup>* is live according to the classical definition [4] iff its characteristic property *<sup>Φ</sup>P* is live according to our definition. Moreover, the intersection of safety and liveness contains only the single degenerate property that always outputs .

362 T. A. Henzinger et al.

**Proposition 38.** *A property Φ is safe and live iff Φ*(*f*) = *for all f* <sup>∈</sup> *Σ<sup>ω</sup>.*

We define co-liveness symmetrically, and note that the duals of the observations above also hold for co-liveness.

**Definition 39 (Co-liveness).** *A property Φ* : *Σ<sup>ω</sup>* <sup>→</sup> <sup>D</sup> *is* co-live *iff for all f* <sup>∈</sup> *Σ<sup>ω</sup>, if <sup>Φ</sup>*(*f*) *<sup>&</sup>gt;* <sup>⊥</sup>*, then there exists a value <sup>v</sup>* <sup>∈</sup> <sup>D</sup> *such that <sup>Φ</sup>*(*f*) ≤ *<sup>v</sup> and for all prefixes <sup>s</sup>* <sup>≺</sup> *<sup>f</sup>, we have* inf*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) <sup>≤</sup> *<sup>v</sup>.*

Next, we present some examples of liveness and co-liveness properties. We start by showing that lim inf- and lim sup-properties can be live and co-live.

*Example 40.* Let *Σ* <sup>=</sup> {*a, b*} be an alphabet, and let *P* <sup>=</sup> -♦*a* and *Q* <sup>=</sup> ♦*b* be boolean properties defined in linear temporal logic. Consider their characteristic properties *<sup>Φ</sup>P* and *<sup>Φ</sup>Q*. As we pointed out earlier, our definitions generalize their boolean counterparts, therefore *<sup>Φ</sup>P* and *<sup>Φ</sup>Q* are both live and co-live. Moreover, *<sup>Φ</sup>P* is a lim sup-property: define *<sup>π</sup>P* (*s*)=1 if *<sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup>*a*, and *<sup>π</sup>P* (*s*)=0 otherwise, and observe that *<sup>Φ</sup><sup>P</sup>* (*f*) = lim sup*s*≺*f <sup>π</sup><sup>P</sup>* (*s*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. Similarly, *<sup>Φ</sup><sup>Q</sup>* is a lim inf-property. 

Now, we show that the maximal response-time property is live, and the minimal response time is co-live.

*Example 41.* Recall the co-safety property *<sup>Φ</sup>*max of maximal response time from Example 26. Let *f* <sup>∈</sup> *Σ<sup>ω</sup>* such that *<sup>Φ</sup>*max(*f*) *<sup>&</sup>lt;* <sup>∞</sup>. We can extend every prefix *s* <sup>≺</sup> *f* with *g* <sup>=</sup> rq tk*<sup>ω</sup>*, which gives us *<sup>Φ</sup>*max(*sg*) = <sup>∞</sup> *> Φ*(*f*). Equivalently, for every *f* <sup>∈</sup> *Σ<sup>ω</sup>*, we have *<sup>Φ</sup>*<sup>∗</sup> max(*f*) = <sup>∞</sup> *> Φ*max(*f*). Hence *<sup>Φ</sup>*max is live and, analogously, the safety property *<sup>Φ</sup>*min from Example <sup>3</sup> is co-live. 

Finally, we show that the *average response-time* property is live and co-live.

*Example 42.* Let *Σ* <sup>=</sup> {rq*,* gr*,* tk*,* oo}. For all *s* <sup>∈</sup> *Σ*<sup>∗</sup>, let *<sup>p</sup>*(*s*)=1 if there is no pending rq in *s*, and *p*(*s*)=0 otherwise. Define *π*valid(*s*) = |{*<sup>r</sup> <sup>s</sup>* | ∃*<sup>t</sup>* <sup>∈</sup> *Σ*<sup>∗</sup> : *r* <sup>=</sup> *t* rq <sup>∧</sup> *p*(*t*)=1}| as the number of valid requests in *s*, and define *π*time(*s*) as the number of tk observations that occur after a valid rq and before the matching gr. Then, *<sup>Φ</sup>*avg = (*π*avg*,* lim inf), where *<sup>π</sup>*avg(*s*) = *<sup>π</sup>*time(*s*) *π*valid(*s*) for all *s* <sup>∈</sup> *Σ*<sup>∗</sup> with *<sup>π</sup>*valid(*s*) *<sup>&</sup>gt;* <sup>0</sup>, and *<sup>π</sup>*avg(*s*) = <sup>∞</sup> otherwise. For example, *<sup>π</sup>*avg(*s*) = <sup>3</sup> 2 for *<sup>s</sup>* <sup>=</sup> rq tk gr tk rq tk rq tk. Note that *<sup>Φ</sup>*avg is a lim inf-property.

The property *<sup>Φ</sup>*avg is defined on the value domain [0*,* <sup>∞</sup>] and is both live and co-live. To see this, let *f* <sup>∈</sup> *Σ<sup>ω</sup>* such that <sup>0</sup> *< Φ*avg(*f*) *<sup>&</sup>lt;* <sup>∞</sup> and, for every prefix *s* <sup>≺</sup> *f*, consider *g* <sup>=</sup> rq tk*<sup>ω</sup>* and *<sup>h</sup>* <sup>=</sup> gr (rq gr)*ω*. Since *sg* has a pending request followed by infinitely many clock ticks, we have *Φ*avg(*sg*) = <sup>∞</sup>. Similarly, since *sh* eventually has all new requests immediately granted, we get *Φ*avg(*sh*)=0. 

#### **5.1 The Quantitative Safety-Liveness Decomposition**

A celebrated theorem states that every boolean property can be expressed as an intersection of a safety property and a liveness property [4]. In this section, we prove the analogous result for the quantitative setting.

*Example 43.* Let *Σ* <sup>=</sup> {rq*,* gr*,* tk*,* oo}. Recall the maximal response-time property *<sup>Φ</sup>*max from Example 26, and the average response-time property *<sup>Φ</sup>*avg from Example 42. Let *n >* <sup>0</sup> be an integer and define a new property *Φ* by *Φ*(*f*) = *Φ*avg(*f*) if *Φ*max(*f*) <sup>≤</sup> *n*, and *Φ*(*f*)=0 otherwise. For the safety closure of *Φ*, we have *Φ*<sup>∗</sup>(*f*) = *<sup>n</sup>* if *<sup>Φ</sup>*max(*f*) <sup>≤</sup> *<sup>n</sup>*, and *<sup>Φ</sup>*<sup>∗</sup>(*f*)=0 otherwise. Now, we further define *Ψ*(*f*) = *Φ*avg(*f*) if *<sup>Φ</sup>*max(*f*) <sup>≤</sup> *<sup>n</sup>*, and *<sup>Ψ</sup>*(*f*) = *<sup>n</sup>* otherwise. Observe that *<sup>Ψ</sup>* is live, because every prefix of a trace whose value is less than *n* can be extended to a greater value. Finally, note that for all *f* <sup>∈</sup> *Σ<sup>ω</sup>*, we can express *<sup>Φ</sup>*(*f*) as the pointwise minimum of *Φ*<sup>∗</sup>(*f*) and *<sup>Ψ</sup>*(*f*). Intuitively, the safety part *<sup>Φ</sup>*<sup>∗</sup> of this decomposition checks whether the maximal response time stays below the permitted bound, and the liveness part *Ψ* keeps track of the average response time as long as the bound is satisfied.

Following a similar construction, we show that a safety-liveness decomposition exists for every property.

**Theorem 44.** *For every property Φ, there exists a liveness property Ψ such that Φ*(*f*) = min(*Φ*<sup>∗</sup>(*f*)*, Ψ*(*f*)) *for all f* <sup>∈</sup> *Σ<sup>ω</sup>.*

In particular, if the given property is safe or live, the decomposition is trivial.

*Remark 45.* Let *Φ* be a property. If *Φ* is safe (resp. live), then the safety (resp. liveness) part of the decomposition is *Φ* itself, and the liveness (resp. safety) part is the constant property that maps every trace to .

For co-safety and co-liveness, the duals of Theorem 44 and Remark 45 hold. In particular, every property is the pointwise maximum of its co-safety closure and a co-liveness property.

#### **5.2 Related Definitions of Quantitative Liveness**

In [41], the authors define a property *Φ* as *multi-live* iff *Φ*<sup>∗</sup>(*f*) *<sup>&</sup>gt;* <sup>⊥</sup> for all *f* <sup>∈</sup> *Σ<sup>ω</sup>*. We show that our definition is more restrictive, resulting in fewer liveness properties while still allowing a safety-liveness decomposition.

**Proposition 46.** *Every live property is multi-live, and the inclusion is strict.*

We provide a separating example on a totally ordered domain below.

*Example 47.* Let *Σ* <sup>=</sup> {*a, b, c*}, and consider the following property: *Φ*(*f*)=0 if *f* <sup>|</sup><sup>=</sup> *a*, and *Φ*(*f*)=1 if *f* <sup>|</sup><sup>=</sup> ♦*c*, and *Φ*(*f*)=2 otherwise (i.e., if *f* <sup>|</sup><sup>=</sup> ♦*b*∧-¬*c*). For all *f* <sup>∈</sup> *Σ<sup>ω</sup>* and prefixes *<sup>s</sup>* <sup>≺</sup> *<sup>f</sup>*, we have *<sup>Φ</sup>*(*sc<sup>ω</sup>*)=1. Thus *<sup>Φ</sup>*<sup>∗</sup>(*f*) = ⊥, which implies that *Φ* is multi-live. However, *Φ* is not live. Indeed, for every *f* <sup>∈</sup> *Σ<sup>ω</sup>* such that *f* <sup>|</sup><sup>=</sup> ♦*c*, we have *Φ*(*f*)=1 *<* . Moreover, *f* admits some prefix *s* that contains an occurrence of *<sup>c</sup>*, thus satisfying sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*)=1.

In [27], the authors define a property *Φ* as *verdict-live* iff for every *f* <sup>∈</sup> *Σ<sup>ω</sup>* and value *v* <sup>≤</sup> *Φ*(*f*), every prefix *s* <sup>≺</sup> *f* satisfies *Φ*(*sg*) = *v* for some *g* <sup>∈</sup> *Σ<sup>ω</sup>*. We show that our definition is more liberal.

364 T. A. Henzinger et al.

## **Proposition 48.** *Every verdict-live property is live, and the inclusion is strict.*

We provide a separating example below, concluding that our definition is strictly more general even for totally ordered domains.

*Example 49.* Let *Σ* <sup>=</sup> {*a, b*}, and consider the following property: *Φ*(*f*)=0 if *f* -<sup>|</sup><sup>=</sup> ♦*b*, and *Φ*(*f*)=1 if *f* <sup>|</sup><sup>=</sup> ♦(*b* ∧ ♦*b*), and *Φ*(*f*)=2−|*s*<sup>|</sup> otherwise, where *s* <sup>≺</sup> *f* is the shortest prefix in which *b* occurs. Consider an arbitrary *f* <sup>∈</sup> *Σ<sup>ω</sup>*. If *Φ*(*f*)=1, then the liveness condition is vacuously satisfied. If *Φ*(*f*)=0, then *f* <sup>=</sup> *a<sup>ω</sup>*, and every prefix *<sup>s</sup>* <sup>≺</sup> *<sup>f</sup>* can be extended with *<sup>g</sup>* <sup>=</sup> *ba<sup>ω</sup>* or *<sup>h</sup>* <sup>=</sup> *<sup>b</sup><sup>ω</sup>* to obtain *Φ*(*sg*)=2−(|*s*|+1) and *Φ*(*sh*)=1. If <sup>0</sup> *< Φ*(*f*) *<* <sup>1</sup>, then *f* satisfies ♦*b* but not ♦(*b*∧♦*b*), and every prefix *s* <sup>≺</sup> *f* can be extended with *b<sup>ω</sup>* to obtain *<sup>Φ</sup>*(*sb<sup>ω</sup>*)=1. Hence *Φ* is live. However, *Φ* is not verdict-live. To see this, consider the trace *f* <sup>=</sup> *a<sup>k</sup>ba<sup>ω</sup>* for some integer *<sup>k</sup>* <sup>≥</sup> <sup>1</sup> and note that *<sup>Φ</sup>*(*f*)=2−(*k*+1). Although all prefixes of *f* can be extended to reach the value 1, the value domain contains elements between *Φ*(*f*) and 1, namely the values <sup>2</sup><sup>−</sup>*<sup>m</sup>* for <sup>1</sup> <sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> *<sup>k</sup>*. Each of these values can be rejected after reading a finite prefix of *f*, because for *n* <sup>≥</sup> *m* it is not possible to extend *a<sup>n</sup>* to reach the value <sup>2</sup><sup>−</sup>*m*. 

## **6 Approximate Monitoring through Approximate Safety**

In this section, we consider properties on extended reals <sup>R</sup>±∞ <sup>=</sup> <sup>R</sup>∪{−∞*,* <sup>+</sup>∞}. We denote by <sup>R</sup>≥<sup>0</sup> the set of nonnegative real numbers.

**Definition 50 (Approximate safety and co-safety).** *Let α* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*. A property Φ is α*-safe *iff for every f* <sup>∈</sup> *Σ<sup>ω</sup> and value <sup>v</sup>* <sup>∈</sup> <sup>R</sup>±∞ *with <sup>Φ</sup>*(*f*) *< v, there exists a prefix <sup>s</sup>* <sup>≺</sup> *<sup>f</sup> such that* sup*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) *< v* <sup>+</sup> *<sup>α</sup>. Similarly, <sup>Φ</sup> is <sup>α</sup>*-co-safe *iff for every f* <sup>∈</sup> *Σ<sup>ω</sup> and <sup>v</sup>* <sup>∈</sup> <sup>R</sup>±∞ *with <sup>Φ</sup>*(*f*) *> v, there exists <sup>s</sup>* <sup>≺</sup> *<sup>f</sup> such that* inf*g*∈*Σ<sup>ω</sup> <sup>Φ</sup>*(*sg*) *> v* <sup>−</sup> *<sup>α</sup>. When <sup>Φ</sup> is <sup>α</sup>-safe (resp. <sup>α</sup>-co-safe) for some <sup>α</sup>* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, we say that Φ is* approximately safe *(resp.* approximately co-safe*).*

Approximate safety can be characterized through the following relation with the safety closure.

**Proposition 51.** *For every error bound α* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, a property <sup>Φ</sup> is <sup>α</sup>-safe iff Φ*<sup>∗</sup>(*f*) <sup>−</sup> *Φ*(*f*) <sup>≤</sup> *α for all f* <sup>∈</sup> *Σ<sup>ω</sup>.*

An analogue of Proposition 51 holds for approximate co-safety and the cosafety closure. Moreover, approximate safety and approximate co-safety are dual notions that are connected by the complement operation, similarly to their precise counterparts (Theorem 27).

#### **6.1 The Intersection of Approximate Safety and Co-safety**

Recall the ghost monitor from the introduction. If, after a finite number of observations, all the possible prediction values are close enough, then we can simply freeze the current value and achieve a sufficiently small error. This happens for properties that are both approximately safe and approximately co-safe, generalizing the unfolding approximation of discounted properties [13].

**Proposition 52.** *For every limit property <sup>Φ</sup> and all error bounds α, β* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, if Φ is α-safe and β-co-safe, then the set S<sup>δ</sup>* = {*s* ∈ *Σ*<sup>∗</sup> | sup*<sup>r</sup>*1∈*Σ*<sup>∗</sup> *Φ*(*sr*1) − inf*<sup>r</sup>*2∈*Σ*<sup>∗</sup> *Φ*(*sr*2) ≥ *δ*} *is finite for all reals δ>α* + *β.*

Based on this proposition, we show that, for limit properties that are both approximately safe and approximately co-safe, the influence of the suffix on the property value is eventually negligible.

**Theorem 53.** *For every limit property <sup>Φ</sup> such that <sup>Φ</sup>*(*f*) <sup>∈</sup> <sup>R</sup> *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>, and for all error bounds α, β* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, if <sup>Φ</sup> is <sup>α</sup>-safe and <sup>β</sup>-co-safe, then for every real δ>α* <sup>+</sup> *<sup>β</sup> and trace <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>, there is a prefix <sup>s</sup>* <sup>≺</sup> *<sup>f</sup> such that for all continuations <sup>w</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup> <sup>∪</sup> *<sup>Σ</sup><sup>ω</sup>, we have* <sup>|</sup>*Φ*(*sw*) <sup>−</sup> *<sup>Φ</sup>*(*s*)<sup>|</sup> *< δ.*

We illustrate this theorem with a *discounted safety* property.

*Example 54.* Let *<sup>P</sup>* <sup>⊆</sup> *<sup>Σ</sup><sup>ω</sup>* be a boolean safety property. We define the finitary property *<sup>π</sup><sup>P</sup>* : *<sup>Σ</sup>*<sup>∗</sup> <sup>→</sup> [0*,* 1] as follows: *<sup>π</sup><sup>P</sup>* (*s*)=1 if *sf* <sup>∈</sup> *<sup>P</sup>* for some *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*, and *<sup>π</sup><sup>P</sup>* (*s*)=1 <sup>−</sup> <sup>2</sup>−|*r*<sup>|</sup> otherwise, where *<sup>r</sup> <sup>s</sup>* is the shortest prefix with *rf /*<sup>∈</sup> *<sup>P</sup>* for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. The limit property *<sup>Φ</sup>* = (*π<sup>P</sup> ,* inf) is called *discounted safety* [3]. Because *Φ* is an inf-property, it is safe by Theorem 20. Now consider the finitary property *π <sup>P</sup>* defined by *π <sup>P</sup>* (*s*)=1 <sup>−</sup> <sup>2</sup>−|*s*<sup>|</sup> if *sf* <sup>∈</sup> *<sup>P</sup>* for some *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*, and *π <sup>P</sup>* (*s*)=1 <sup>−</sup> <sup>2</sup>−|*r*<sup>|</sup> otherwise, where *<sup>r</sup> <sup>s</sup>* is the shortest prefix with *rf /*<sup>∈</sup> *<sup>P</sup>* for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. Let *<sup>Φ</sup>* = (*π <sup>P</sup> ,*sup), and note that *Φ*(*f*) = *Φ* (*f*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>*. Hence *Φ* is also co-safe, because it is a sup-property.

Let *<sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>* and *δ >* <sup>0</sup>. For every prefix *<sup>s</sup>* <sup>≺</sup> *<sup>f</sup>*, the set of possible prediction values is either the range [1 <sup>−</sup> <sup>2</sup>−|*s*<sup>|</sup> *,* 1] or the singleton {<sup>1</sup> <sup>−</sup> <sup>2</sup>−|*r*<sup>|</sup> }, where *r s* is chosen as above. In the latter case, we have |*Φ*(*sw*) − *Φ*(*s*)| = 0 *< δ* for all *<sup>w</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup> <sup>∪</sup> *<sup>Σ</sup><sup>ω</sup>*. In the former case, since the range becomes smaller as the prefix grows, there is a prefix *<sup>s</sup>* <sup>≺</sup> *<sup>f</sup>* with <sup>2</sup>−|*s* <sup>|</sup> *< δ*, which yields |*Φ*(*s w*) − *Φ*(*s* )| *< δ* for all *<sup>w</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup> <sup>∪</sup> *<sup>Σ</sup><sup>ω</sup>*. 

#### **6.2 Finite-state Approximate Monitoring**

Monitors with finite state spaces are particularly desirable, because finite automata enjoy a plethora of desirable closure and decidability properties. Here, we prove that properties that are both approximately safe and approximately co-safe can be monitored approximately by a finite-state monitor. First, we recall the notion of abstract quantitative monitor from [30].

A binary relation ∼ over *Σ*<sup>∗</sup> is an *equivalence relation* iff it is reflexive, symmetric, and transitive. Such a relation is *right-monotonic* iff *s*<sup>1</sup> ∼ *s*<sup>2</sup> implies *s*1*r* ∼ *s*2*r* for all *s*1*, s*2*, r* ∈ *Σ*∗. For an equivalence relation ∼ over *Σ*<sup>∗</sup> and a finite trace *s* ∈ *Σ*∗, we write [*s*]<sup>∼</sup> for the equivalence class of ∼ to which *s* belongs. When ∼ is clear from the context, we write [*s*] instead. We denote by *Σ*∗*/*∼ the quotient of the relation ∼.

**Definition 55 (Abstract monitor [30]).** *An* abstract monitor M = (∼*, γ*) *is a pair consisting of a right-monotonic equivalence relation* ∼ *on Σ*<sup>∗</sup> *and a function <sup>γ</sup>* : (*Σ*∗*/* <sup>∼</sup>) <sup>→</sup> <sup>R</sup>±∞*. The monitor* <sup>M</sup> *is* finite-state *iff the relation* <sup>∼</sup> *has finitely many equivalence classes. Let <sup>δ</sup>*fin*, δ*lim <sup>∈</sup> <sup>R</sup>±∞ *be error bounds. We say that* M *is a* (*δ*fin*, δ*lim)-monitor *for a given limit property Φ* = (*π,* ) *iff for all <sup>s</sup>* <sup>∈</sup> *<sup>Σ</sup>*<sup>∗</sup> *and <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>, we have* <sup>|</sup>*π*(*s*) <sup>−</sup> *<sup>γ</sup>*([*s*])| ≤ *<sup>δ</sup>*fin *and* <sup>|</sup>*<sup>s</sup>*≺*<sup>f</sup>* (*π*(*s*)) <sup>−</sup> *<sup>s</sup>*≺*<sup>f</sup>* (*γ*([*s*]))| ≤ *δ*lim*.*

Building on Theorem 53, we identify a sufficient condition to guarantee the existence of an abstract monitor with finitely many equivalence classes.

**Theorem 56.** *For every limit property <sup>Φ</sup> such that <sup>Φ</sup>*(*f*) <sup>∈</sup> <sup>R</sup> *for all <sup>f</sup>* <sup>∈</sup> *<sup>Σ</sup><sup>ω</sup>, and for all error bounds α, β* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, if <sup>Φ</sup> is <sup>α</sup>-safe and <sup>β</sup>-co-safe, then for every real δ>α* + *β, there exists a finite-state* (*δ, δ*)*-monitor for Φ.*

Due to Theorem 56, the discounted safety property of Example 54 has a finite-state monitor for every positive error bound. We remark that Theorem 56 is proved by a construction that generalizes the unfolding approach for the approximate determinization of discounted automata [12], which unfolds an automaton until the distance constraint is satisfied.

## **7 Conclusion**

We presented a generalization of safety and liveness that lifts the safety-progress hierarchy to the quantitative setting of [18] while preserving major desirable features of the boolean setting, such as the safety-liveness decomposition.

Monitorability identifies a boundary separating properties that can be verified or falsified from a finite number of observations, from those that cannot. Safety-liveness and co-safety-co-liveness decompositions allow us separate, for an individual property, monitorable parts from nonmonitorable parts. The larger the monitorable parts of the given property, the stronger the decomposition. We provided the strongest known safety-liveness decomposition, which consists of a pointwise minimum between a safe part defined by a quantitative safety closure, and a live part which corrects for the difference. We then defined approximate safety as the relaxation of safety by a parametric error bound. This further increases the monitorability of properties and offers monitorability at a parametric cost. In fact, we showed that every property that is both approximately safe and approximately co-safe can be monitored arbitrarily precisely by a finite-state monitor. A future direction is to extend our decomposition to approximate safety together with a support for quantitative assumptions [32].

The literature contains efficient model-checking procedures that leverage the boolean safety hypothesis [36,40]. We thus expect that also quantitative safety and co-safety, and their approximations, enable efficient verification algorithms for quantitative properties.

**Acknowledgments.** We thank the anonymous reviewers for their helpful comments. This work was supported in part by the ERC-2020-AdG 101020093.

## **References**


368 T. A. Henzinger et al.


the Joint European Conference on Theory and Practice of Software, ETAPS 2002, Grenoble, France, April 8-12, 2002, Proceedings. Lecture Notes in Computer Science, vol. 2280, pp. 342–356. Springer (2002). https://doi.org/10.1007/ 3-540-46002-0\_24


vol. 2648, pp. 74–88. Springer (2003). https://doi.org/10.1007/3-540-44829-2\_ 5


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## On the Comparison of Discounted-Sum Automata with Multiple Discount Factors

Udi Bokerand Guy Hefetz()

Reichman University, Herzliya, Israel udiboker@runi.ac.il, ghefetz@gmail.com

Abstract. We look into the problems of comparing nondeterministic discounted-sum automata on finite and infinite words. That is, the problems of checking for automata A and B whether or not it holds that for all words w, A(w) = B(w), A(w) ≤ B(w), or A(w) < B(w).

These problems are known to be decidable when both automata have the same single integral discount factor, while decidability is open in all other settings: when the single discount factor is a non-integral rational; when each automaton can have multiple discount factors; and even when each has a single integral discount factor, but the two are different.

We show that it is undecidable to compare discounted-sum automata with multiple discount factors, even if all are integrals, while it is decidable to compare them if each has a single, possibly different, integral discount factor. To this end, we also provide algorithms to check for given nondeterministic automaton N and deterministic automaton D, each with a single, possibly different, rational discount factor, whether or not N (w) = D(w), N (w) ≥ D(w), or N (w) > D(w) for all words w.

Keywords: Discounted-sum Automata · Comparison · Containment

## 1 Introduction

Equivalence and containment checks of Boolean automata, namely the checks of whether L(A) = L(B), L(A) ⊆ L(B), or L(A) ⊂ L(B), where L(A) and L(B) are the languages that A and B recognize, are central in the usage of automata theory in diverse areas, and in particular in formal verification (e.g, [34,26,17,33,35,28]). Likewise, comparison of quantitative automata, which extends the equivalence and containment checks by asking whether A(w) = B(w), whether A(w) ≤ B(w), or whether A(w) < B(w) for all words w, are essential for harnessing quantitative-automata theory to the service of diverse fields and in particular to the service of quantitative formal verification (e.g, [15,14,21,11,27,3,5,22]).

Discounted summation is a common valuation function in quantitative automata theory (e.g, [19,12,14,15]), as well as in various other computational models, such as games (e.g., [37,4,1]), Markov decision processes (e.g, [23,29,16]), and reinforcement learning (e.g, [32,36]), as it formalizes the concept that an immediate reward is better than a potential one in the far future, as well as that a

<sup>-</sup>Research supported by the Israel Science Foundation grant 2410/22.

potential problem (such as a bug in a reactive system) in the far future is less troubling than a current one.

A nondeterministic discounted-sum automaton (NDA) has rational weights on the transitions, and a fixed rational discount factor λ > <sup>1</sup>. The value of a (finite or infinite) run is the discounted summation of the weights on the transitions, such that the weight in the ith transition of the run is divided by λi . The value of a (finite or infinite) word is the infimum value of the automaton runs on it. An NDA thus realizes a function from words to real numbers.

NDAs cannot always be determinized [15], they are not closed under basic algebraic operations [8], and their comparison is not known to be decidable, relating to various longstanding open problems [9]. However, restricting NDAs to have an integral discount factor λ <sup>∈</sup> <sup>N</sup> \ {0, <sup>1</sup>} provides a robust class of automata that is closed under determinization and under algebraic operations, and for which comparison is decidable [8].

Various variants of NDAs are studied in the literature, among which are *functional*, *k-valued*, *probabilistic*, and more [21,20,13]. Yet, until recently, all of these models were restricted to have a single discount factor. This is a significant restriction of the general discounted-summation paradigm, in which multiple discount factors are considered. For example, Markov decision processes and discounted-sum games allow multiple discount factors within the same entity [23,4]. In [6], NDAs were extended to NMDAs, allowing for multiple discount factors, where each transition can have a different one. Special attention was given to integral NMDAs, namely to those with only integral discount factors, analyzing whether they preserve the good properties of integral NDAs. It was shown that they are generally not closed under determinization and under algebraic operations, while a restricted class of them, named tidy-NMDAs, in which the choice of discount factors depends on the prefix of the word read so far, does preserve the good properties of integral NDAs.

While comparison of tidy-NMDAs with the same choice function is decidable in PSPACE [6], it was left open whether comparison of general integral NMDAs A and B is decidable. It is even open whether comparison of two integral NDAs with different (single) discount factors is decidable.

We show that it is undecidable to resolve for given NMDA N and deterministic NMDA (DMDA) D, even if both have only integral discount factors, on both finite and infinite words, whether N ≡D and whether N ≤D, and on finite words also whether <sup>N</sup> < <sup>D</sup>. We prove the undecidability result by reduction from the halting problem of two-counter machines. The general scheme follows similar reductions, such as in [18,2], yet the crux is in simulating a counter by integral NMDAs. Upfront, discounted summation is not suitable for simulating counters, since a current increment has, in the discounted setting, a much higher influence than of a far-away decrement. However, we show that multiple discount factors allow in a sense to eliminate the influence of time, having automata in which no matter where a letter appears in the word, it will have the same influence on the automaton value. (See Lemma 1 and Fig. 3). Another main part of the proof is in showing how to nondeterministically adjust the automaton weights and discount factors in order to "detect" whether a counter is at a current value 0. (See Figs. 5, 6, 8 and 9.)

On the positive side, we provide algorithms to decide for given NDA N and deterministic NDA (DDA) D, with arbitrary, possibly different, rational discount factors, whether N ≡D, N ≥D, or N > D (Theorem 4). Our algorithms work on both finite and infinite words, and run in PSPACE when the automata weights are represented in binary and their discount factors in unary. Since integral NDAs can always be determinized [8], our method also provides an algorithm to compare two integral NDAs, though not necessarily in PSPACE, since determinization might exponentially increase the number of states. (Even though determinization of NDAs is in PSPACE [8,6], the exponential number of states might require an exponential space in our algorithms of comparing NDAs with different discount factors.)

The challenge with comparing automata with different discount factors comes from the combination of their different accumulations, which tends to be intractable, resulting in the undecidability of comparing integral NMDAs, and in the open problems of comparing rational NDAs and of analyzing the representation of numbers in a non-integral basis [30,24,25,9]. Yet, the main observation underlying our algorithm is that when each automaton has a single discount factor, we may unfold the combination of their computation trees only up to some level k, after which we can analyze their continuation separately, first handling the automaton with the lower (slower decreasing) discount factor and then the other one. The idea is that after level k, since the accumulated discounting of the second automaton is already much more significant, even a single non-optimal transition of the first automaton cannot be compensated by a continuation that is better with respect to the second automaton. We thus compute the optimal suffix words and runs of the first automaton from level k, on top which we compute the optimal runs of the second automaton.

## 2 Preliminaries

*Words.* An *alphabet* Σ is an arbitrary finite set, and a *word* over Σ is a finite or infinite sequence of letters in Σ, with ε for the empty word. We denote the concatenation of a finite word u and a finite or infinite word w by u·w, or simply by uw. We define Σ<sup>+</sup> to be the set of all finite words except the empty word, i.e., <sup>Σ</sup><sup>+</sup> <sup>=</sup> <sup>Σ</sup><sup>∗</sup>\{ε}. For a word <sup>w</sup> <sup>=</sup> <sup>σ</sup>0σ1σ<sup>2</sup> ··· and indexes <sup>i</sup> <sup>≤</sup> <sup>j</sup>, we denote the *letter at index* i as w[i] = σi, and the *sub-word from* i *to* j as w[i..j] = σiσi+1 ··· σ<sup>j</sup> .

For a finite word w and letter σ ∈ Σ, we denote the number of occurrences of σ in w by #(σ, w), and for a set S ⊆ Σ, we denote - <sup>σ</sup>∈<sup>S</sup> #(σ, w) by #(S, w).

For a finite or infinite word w and a letter σ ∈ Σ, we define the *prefix of* w *up to* σ, prefσ(w), as the minimal prefix of w that contains a σ letter if there is a σ letter in w or w itself if it does not contain any σ letters. Formally, prefσ(w) = w 0.. min{i | w[i] = σ} ∃i | w[i] = σ w otherwise

Automata. A nondeterministic discounted-sum automaton (NDA) [15] is an automaton with rational weights on the transitions, and a fixed rational discount factor λ > <sup>1</sup>. A nondeterministic discounted-sum automaton with multiple discount factors (NMDA) [6] is similar to an NDA, but with possibly a different discount factor on each of its transitions. They are formally defined as follows:

Definition 1 ([6]). A nondeterministic discounted-sum automaton with multiple discount factors (NMDA), on finite or infinite words, is a tuple A = -Σ, Q, ι, δ, γ, ρ over an alphabet Σ, with a finite set of states Q, an initial set of states ι <sup>⊆</sup> Q, a transition function δ <sup>⊆</sup> Q <sup>×</sup> Σ <sup>×</sup> Q, a weight function γ : δ <sup>→</sup> <sup>Q</sup>, and a discount-factor function ρ : δ <sup>→</sup> <sup>Q</sup> <sup>∩</sup> (1, <sup>∞</sup>), assigning to each transition its discount factor, which is a rational greater than one. <sup>1</sup>


For example, the value of the run <sup>r</sup><sup>1</sup> <sup>=</sup> <sup>q</sup><sup>0</sup>, a, q<sup>0</sup>, a, q<sup>1</sup>, b, q<sup>2</sup> of <sup>A</sup> from Fig. <sup>1</sup> is <sup>A</sup>(r<sup>1</sup>)=1+ <sup>1</sup> <sup>2</sup> · <sup>1</sup> <sup>3</sup> + 2 · <sup>1</sup> <sup>2</sup>·<sup>3</sup> <sup>=</sup> <sup>3</sup> 2 .


Counter machines. A two-counter machine [31] <sup>M</sup> is a sequence (l<sup>1</sup>,...,l<sup>n</sup>) of commands, for some n <sup>∈</sup> <sup>N</sup>, involving two counters x and y. We refer to { <sup>1</sup>,...,n } as the locations of the machine. For every i ∈ { <sup>1</sup>,...,n } we refer to <sup>l</sup><sup>i</sup> as the command in location <sup>i</sup>. There are five possible forms of commands:

inc(c), dec(c), goto <sup>l</sup><sup>k</sup>, if <sup>c</sup>=0 goto <sup>l</sup><sup>k</sup> else goto <sup>l</sup><sup>k</sup>- , halt,

where c ∈ { x, y } is a counter and <sup>1</sup> <sup>≤</sup> k, k- <sup>≤</sup> n are locations. For not decreasing a zero-valued counter c ∈ { x, y }, every dec(c) command is preceded by the

<sup>1</sup> Discount factors are sometimes defined as numbers between 0 and 1, under which setting weights are multiplied by these factors rather than divided by them.

Fig. 1. An NMDA <sup>A</sup>. The labeling on the transitions indicate the alphabet letter, the weight of the transition, and its discount factor.

command if c=0 goto <current\_line> else goto <next\_line>, and there are no other direct goto-commands to it. The counters are initially set to 0. An example of a two-counter machine is given in Fig. 2.

> l1. inc(x) l2. inc(x) l3. if x=0 goto l<sup>3</sup> else goto l<sup>4</sup> l4. dec(x) l5. if x=0 goto l<sup>6</sup> else goto l<sup>3</sup> l6. halt

Fig. 2. An example of a two-counter machine.

Let L be the set of possible commands in M, then a *run* of M is a sequence <sup>ψ</sup> <sup>=</sup> <sup>ψ</sup>1,...,ψ<sup>m</sup> <sup>∈</sup> (<sup>L</sup> <sup>×</sup> <sup>N</sup> <sup>×</sup> <sup>N</sup>)<sup>∗</sup> such that the following hold:

	- If l<sup>j</sup> is an inc(x) command (resp. inc(y)), then α <sup>x</sup> = α<sup>x</sup> + 1, α <sup>y</sup> = α<sup>y</sup> (resp. α<sup>y</sup> = α<sup>y</sup> + 1, α <sup>x</sup> = αx), and l = lj+1.
	- If l<sup>j</sup> is dec(x) (resp. dec(y)) then α <sup>x</sup> = α<sup>x</sup> − 1, α <sup>y</sup> = α<sup>y</sup> (resp. α<sup>y</sup> = α<sup>y</sup> − 1, α <sup>x</sup> = αx), and l = lj+1.
	- If l<sup>j</sup> is goto l<sup>k</sup> then α <sup>x</sup> = αx, α <sup>y</sup> = αy, and l = lk.
	- If l<sup>j</sup> is if x=0 goto l<sup>k</sup> else goto l<sup>k</sup> then α <sup>x</sup> = αx, α <sup>y</sup> = αy, and l = l<sup>k</sup> if α<sup>x</sup> = 0, and l = l<sup>k</sup>otherwise.
	- If l<sup>j</sup> is if y=0 goto l<sup>k</sup> else goto l<sup>k</sup> then α <sup>x</sup> = αx, α <sup>y</sup> = αy, and l = l<sup>k</sup> if α<sup>y</sup> = 0, and l = l<sup>k</sup>otherwise.
	- If l is halt then i = m, namely a run does not continue after halt.

If, in addition, we have that <sup>ψ</sup><sup>m</sup> <sup>=</sup> l<sup>j</sup> , αx, αy such that <sup>l</sup><sup>j</sup> is a halt command, we say that ψ is a *halting run*. We say that a machine M 0-halts if its run is halting and ends in l, 0, 0. We say that a sequence of commands τ ∈ L<sup>∗</sup> *fits* a run ψ, if τ is the projection of ψ on its first component.

The *command trace* π = σ1,...,σ<sup>m</sup> of a halting run ψ = ψ1,...,ψ<sup>m</sup> describes the flow of the run, including a description of whether a counter c was equal to 0 or larger than 0 in each occurrence of an if c=0 goto l<sup>k</sup> else goto l<sup>k</sup>- command. It is formally defined as follows. <sup>σ</sup><sup>m</sup> <sup>=</sup> halt and for every <sup>1</sup> < i <sup>≤</sup> <sup>m</sup>, we define σ<sup>i</sup>−<sup>1</sup> according to ψ<sup>i</sup>−<sup>1</sup> = (l<sup>j</sup> , αx, αy) in the following manner:


For example, the command trace of the halting run of the machine in Fig. 2 is inc(x), inc(x), (goto <sup>l</sup><sup>4</sup>,x > 0), dec(x), (goto l<sup>3</sup>,x > 0), (goto l<sup>4</sup>,x > 0), dec(x), (goto l<sup>6</sup>, x = 0), halt.

Deciding whether a given counter machine M halts is known to be undecidable [31]. Deciding whether M halts with both counters having value 0, termed the 0*-halting problem*, is also undecidable. Indeed, the halting problem can be reduced to the latter by adding some commands that clear the counters, before every halt command.

## 3 Comparison of NMDAs

We show that comparison of (integral) NMDAs is undecidable by reduction from the halting problem of two-counter machines. Notice that our NMDAs only use integral discount factors, while they do have non-integral weights. Yet, weights can be easily changed to integers as well, by multiplying them all by a common denominator and making the corresponding adjustments in the calculations.

We start with a lemma on the accumulated value of certain series of discount factors and weights. Observe that by the lemma, no matter where the pair of discount-factor λ <sup>∈</sup> <sup>N</sup> \ {0, <sup>1</sup>} and weight w <sup>=</sup> <sup>λ</sup>−<sup>1</sup> λ appear along the run, they will have the same effect on the accumulated value. This property will play a key role in simulating counting by NMDAs.

Lemma 1. *For every sequence* <sup>λ</sup><sup>1</sup>, ··· , λm *of integers larger than* <sup>1</sup> *and weights* <sup>w</sup><sup>1</sup>, ··· , wm *such that* <sup>w</sup>i <sup>=</sup> <sup>λ</sup>i−<sup>1</sup> <sup>λ</sup><sup>i</sup> *, we have* m i=1 <sup>w</sup>i · i−<sup>1</sup> j=1 1 λj = 1 <sup>−</sup> - 1 m <sup>j</sup>=1 <sup>λ</sup><sup>j</sup> *.*

The proof is by induction on m and appears in [7].

#### 3.1 The Reduction

We turn to our reduction from the halting problem of two-counter machines to the problem of NMDA containment. We provide the construction and the correctness lemma with respect to automata on finite words, and then show in Section 3.2 how to use the same construction also for automata on infinite words.

Given a two-counter machine <sup>M</sup> with the commands (l<sup>1</sup>,...,ln), we construct an integral DMDA A and an integral NMDA B on finite words, such that <sup>M</sup> <sup>0</sup>-halts iff there exists a word w <sup>∈</sup> Σ<sup>+</sup> such that <sup>B</sup>(w) ≥ A(w) iff there exists a word w <sup>∈</sup> Σ<sup>+</sup> such that <sup>B</sup>(w) <sup>&</sup>gt; <sup>A</sup>(w).

The automata A and B operate over the following alphabet Σ, which consists of 5n + 5 letters, standing for the possible elements in a command trace of M:

$$\begin{aligned} \Sigma^{\text{INCDEC}} &= \left\{ \text{inc}(x), \text{DEC}(x), \text{inc}(y), \text{DEC}(y) \right\} \\ \Sigma^{\text{GOTO}} &= \left\{ \text{GOTO} \quad l\_k: k \in \{1, \dots, n\} \right\} \cup \\ &\quad \left\{ \text{(GOTo} \quad l\_k, c = 0): k \in \{1, \dots, n\}, c \in \{x, y\} \right\} \cup \\ &\quad \left\{ \text{(GOTo} \quad l\_{k'}, c > 0): k' \in \{1, \dots, n\}, c \in \{x, y\} \right\} \\ \Sigma^{\text{NOHALT}} &= \Sigma^{\text{INCDEG}} \cup \Sigma^{\text{GOTO}} \\ &\quad \Sigma = \Sigma^{\text{NOHALT}} \cup \left\{ \text{HALT} \right\} \end{aligned}$$

When <sup>A</sup> and <sup>B</sup> read a word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>, they intuitively simulate a sequence of commands τ<sup>u</sup> that induces the command trace u = prefhalt(w). If τ<sup>u</sup> fits the actual run of M, and this run 0-halts, then the minimal run of B on w has a value strictly larger than A(w). If, however, τ<sup>u</sup> does not fit the actual run of M, or it does fit the actual run but it does not 0-halt, then the violation is detected by B, which has a run on w with value strictly smaller than A(w).

In the construction, we use the following partial discount-factor functions <sup>ρ</sup>p, ρ<sup>d</sup> : <sup>Σ</sup>nohalt <sup>→</sup> <sup>N</sup> and partial weight functions <sup>γ</sup>p, γ<sup>d</sup> : <sup>Σ</sup>nohalt <sup>→</sup> <sup>Q</sup>.

$$\rho\_p(\sigma) = \begin{cases} 5 & \sigma = \text{inc}(x) \\ 4 & \sigma = \text{DEC}(x) \\ 7 & \sigma = \text{inc}(y) \\ 6 & \sigma = \text{DEC}(y) \\ 15 & \text{otherwise} \end{cases} \qquad \rho\_d(\sigma) = \begin{cases} 4 & \sigma = \text{inc}(x) \\ 5 & \sigma = \text{DEC}(x) \\ 6 & \sigma = \text{INC}(y) \\ 7 & \sigma = \text{DEC}(y) \\ 15 & \text{otherwise} \end{cases}$$

γp(σ) = <sup>ρ</sup>p(σ)−<sup>1</sup> <sup>ρ</sup>p(σ) , and <sup>γ</sup>d(σ) = <sup>ρ</sup>d(σ)−<sup>1</sup> <sup>ρ</sup>d(σ) . We say that <sup>ρ</sup><sup>p</sup> and <sup>γ</sup><sup>p</sup> are the *primal* discount-factor and weight functions, while ρ<sup>d</sup> and γ<sup>d</sup> are the *dual* functions. Observe that for every c ∈ {x, y} we have that

$$
\rho\_p(\text{NC}(c)) = \rho\_d(\text{DEC}(c)) > \rho\_p(\text{DEC}(c)) = \rho\_d(\text{NC}(c)) \tag{1}
$$

Intuitively, we will use the primal functions for A's discount factors and weights, and the dual functions for identifying violations. Notice that if changing the primal functions to the dual ones in more occurrences of inc(c) letters than of dec(c) letters along some run, then by Lemma 1 the run will get a value lower than the original one.

We continue with their formal definitions. <sup>A</sup> <sup>=</sup> Σ, {qA, q<sup>h</sup> <sup>A</sup>}, {qA}, δA, γA, ρA is an integral DMDA consisting of two states, as depicted in Fig. 3. Observe that the initial state <sup>q</sup><sup>A</sup> has self loops for every alphabet letter in <sup>Σ</sup>nohalt with weights and discount factors according to the primal functions, and a transition (qA, halt, q<sup>h</sup> <sup>A</sup>) with weight of <sup>14</sup> <sup>15</sup> and a discount factor of 15.

The integral NMDA B = Σ,QB, ιB, δB, γB, ρB is the union of the following eight gadgets (checkers), each responsible for checking a certain type of violation in the description of a 0-halting run of M. It also has the states qfreeze, qhalt ∈ Q<sup>B</sup>

Fig. 3. The DMDA A constructed for the proof of Lemma 2.

such that for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, there are 0-weighted transitions (qfreeze, σ, qfreeze) <sup>∈</sup> <sup>δ</sup><sup>B</sup> and (qhalt, σ, qhalt) <sup>∈</sup> <sup>δ</sup><sup>B</sup> with an arbitrary discount factor. Observer that in all of <sup>B</sup>'s gadgets, the transition over the letter halt to <sup>q</sup>halt has a weight higher than the weight of the corresponding transition in A, so that when no violation is detected, the value of B on a word is higher than the value of A on it.

1. Halt Checker. This gadget, depicted in Fig. 4, checks for violations of nonhalting runs. Observe that its initial state qHC has self loops identical to those of <sup>A</sup>'s initial state, a transition to <sup>q</sup>halt over halt with a weight higher than the corresponding weight in <sup>A</sup>, and a transition to the state <sup>q</sup>last over every letter that is not halt, "guessing" that the run ends without a halt command.

$$\underbrace{\text{nec}(x), \frac{4}{5}, 5}\_{\begin{subarray}{c} \text{Nic}(x), \frac{4}{5}, 5\\ \text{pic} \end{subarray}} \underbrace{\text{C} \underbrace{\text{O} \text{O} \text{O} \text{Nic}(y), \frac{6}{7}, 7\\ \text{pic} \text{D}(y), \frac{5}{6}, 6 \text{D} \text{D} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O} \text{O}$$

Fig. 4. The Halt Checker in the NMDA B.

2. Negative-Counters Checker. The second gadget, depicted in Fig. 5, checks that the input prefix u has no more dec(c) than inc(c) commands for each counter <sup>c</sup> ∈ {x, y}. It is similar to <sup>A</sup>, however having self loops in its initial states that favor dec(c) commands when compared to <sup>A</sup>.

$$\text{nNC}(x), \frac{9}{10}, 10 \underbrace{\underset{0\text{No}}{\text{C}}}\_{\text{DEC}(y), \frac{5}{6}, 6}\_{\text{DEC}(y), \frac{5}{6}, 6} \underbrace{\underset{1\text{NC}(x), \frac{4}{5}, 5}\_{\text{HAT}, \frac{15}{16}, 16}\_{\text{HAT}, \frac{15}{16}, 16} \underbrace{\underset{1\text{NC}(x), \frac{4}{5}, 5}\_{\text{HAT}, \frac{15}{16}, 16}\_{\text{HAT}, \frac{15}{16}, 16} \underbrace{\underset{9\text{eq}}{\text{C}}}\_{\text{DEC}(y), \frac{13}{14}, 14}\_{\text{DEC}(y), \frac{2}{3}, 3}$$

Fig. 5. The negative-counters checker, on the left for x and on the right for y, in the NMDA B.

3. Positive-Counters Checker. The third gadget, depicted in Fig. 6, checks that for every <sup>c</sup> ∈ {x, y}, the input prefix <sup>u</sup> has no more inc(c) than dec(c) commands. It is similar to A, while having self loops in its initial state according to the dual functions rather than the primal ones.

$$\text{DEC}(x), \frac{4}{4}, 5$$

$$\begin{array}{c} \text{INC}(x), \frac{3}{4}, 4\\ \text{INC}(y), \frac{6}{7}, 7\\ \text{DEC}(y), \frac{6}{7}, 7\\ \text{I}^{\text{GOTO}}, \frac{14}{15}, 15 \end{array} \text{INC}(y), \frac{5}{6}, 6\\ \text{I}^{\text{Rult}}$$

Fig. 6. The Positive-Counters Checker in the NMDA B.

4. Command Checker. The next gadget checks for local violations of successive commands. That is, it makes sure that the letter w<sup>i</sup> represents a command that can follow the command represented by w<sup>i</sup>−<sup>1</sup> in M, ignoring the counter values. For example, if the command in location l<sup>2</sup> is inc(x), then from state q2, which is associated with l2, we move with the letter inc(x) to q3, which is associated with l3. The test is local, as this gadget does not check for violations involving illegal jumps due to the values of the counters. An example of the command checker for the counter machine in Fig. 2 is given in Fig. 7.

Fig. 7. The command checker that corresponds to the counter machine in Fig. 2.

The command checker, which is a DMDA, consists of states q1,...,q<sup>n</sup> that correspond to the commands l1,...,ln, and the states qhalt and qfreeze. For two locations <sup>j</sup> and <sup>k</sup>, there is a transition from <sup>q</sup><sup>j</sup> to <sup>q</sup><sup>k</sup> on the letter <sup>σ</sup> iff <sup>l</sup><sup>k</sup> can *locally follow* <sup>l</sup><sup>j</sup> in a run of <sup>M</sup> that has <sup>σ</sup> in the corresponding location of the command trace. That is, either l<sup>j</sup> is a goto l<sup>k</sup> command (meaning l<sup>j</sup> = σ = goto lk), k is the next location after j and l<sup>j</sup> is an inc or a dec command (meaning <sup>k</sup> <sup>=</sup> <sup>j</sup> + 1 and <sup>l</sup><sup>j</sup> <sup>=</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>incdec), <sup>l</sup><sup>j</sup> is an if <sup>c</sup>=0 goto <sup>l</sup><sup>k</sup> else goto <sup>l</sup><sup>k</sup>- command with σ = (goto lk, c = 0), or l<sup>j</sup> is an if c=0 goto l<sup>s</sup> else goto l<sup>k</sup> command with σ = (goto lk,c > 0). The weights and discount factors of the <sup>Σ</sup>nohalt transitions mentioned above are according to the primal functions <sup>γ</sup><sup>p</sup> and ρ<sup>p</sup> respectively. For every location j such that l<sup>j</sup> = halt, there is a transition from q<sup>j</sup> to qhalt labeled by the letter halt with a weight of <sup>15</sup> <sup>16</sup> and a discount factor of 16. Every other transition that was not specified above leads to qfreeze with weight 0 and some discount factor.

5,6. Zero-Jump Checkers. The next gadgets, depicted in Fig. 8, check for violations in conditional jumps. In this case, we use a different checker instance for each counter <sup>c</sup> ∈ {x, y}, ensuring that for every if <sup>c</sup>=0 goto <sup>l</sup><sup>k</sup> else goto <sup>l</sup><sup>k</sup>- command, if the jump goto l<sup>k</sup> is taken, then the value of c is indeed 0.

Fig. 8. The Zero-Jump Checker (for a counter c ∈ { x, y }) in the NMDA B.

Intuitively, q<sup>c</sup> ZC profits from words that have more inc(c) than dec(c) letters, while q<sup>c</sup> continues like A. If the move to q<sup>c</sup> occurred after a balanced number of inc(c) and dec(c), as it should be in a real command trace, neither the prefix word before the move to qc, nor the suffix word after it result in a profit. Otherwise, provided that the counter is 0 at the end of the run (as guaranteed by the negative- and positive-counters checkers), both prefix and suffix words get profits, resulting in a smaller value for the run.

7,8. Positive-Jump Checkers. These gadgets, depicted in Fig. 9, are dual to the zero-jump checkers, checking for the dual violations in conditional jumps. Similarly to the zero-jump checkers, we have a different instance for each counter <sup>c</sup> ∈ {x, y}, ensuring that for every if <sup>c</sup>=0 goto <sup>l</sup><sup>k</sup> else goto <sup>l</sup><sup>k</sup> command, if the jump goto l<sup>k</sup>is taken, then the value of c is indeed greater than 0.

Intuitively, if the counter is 0 on a (goto l<sup>k</sup>- ,c > 0) command when there was no inc(c) command yet, the gadget benefits by moving from q<sup>c</sup> PC0 to qfreeze. If there was an inc(c) command, it benefits by having the dual functions on the move from q<sup>c</sup> PC0 to q<sup>c</sup> PC1 over inc(c) and the primal functions on one additional self loop of q<sup>c</sup> PC1 over dec(c).

Lemma 2. *Given a two-counter machine* M*, we can compute an integral DMDA* A *and an integral NMDA* B *on finite words, such that* M 0*-halts iff there exists a word* <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *such that* <sup>B</sup>(w) ≥ A(w) *iff there exists a word* <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *such that* B(w) > A(w)*.*

The proof uses the construction presented above, and can be found in [7].

## 3.2 Undecidability of Comparison

For finite words, the undecidability result directly follows from Lemma 2 and the undecidability of the 0-halting problem of counter machines [31].

Fig. 9. The Positive-Jump Checker (for a counter c) in the NMDA B.

Theorem 1. *Strict and non-strict containment of (integral) NMDAs on finite words are undecidable. More precisely, the problems of deciding for given integral NMDA* N *and integral DMDA* D *whether* N (w) ≤ D(w) *for all finite words* w *and whether* N (w) < D(w) *for all finite words* w*.*

For infinite words, undecidability of non-strict containment also follows from the reduction given in Section 3.1, as the reduction considers prefixes of the word until the first halt command. We leave open the question of whether strict containment is also undecidable for infinite words. The problem with the latter is that a halt command might never appear in an infinite word w that incorrectly describes a halting run of the two-counter machine, in which case both automata A and B of the reduction will have the same value on w. On words w that have a halt command but do not correctly describe a halting run of the two-counter machine we have B(w) < A(w), and on a word w that does correctly describe a halting run we have B(w) > A(w). Hence, the reduction only relates to whether B(w) ≤ A(w) for all words w, but not to whether B(w) < A(w) for all words w.

Theorem 2. *Non-strict containment of (integral) NMDAs on infinite words is undecidable. More precisely, the problem of deciding for given integral NMDA* N *and integral DMDA* D *whether* N (w) ≤ D(w) *for all infinite words* w*.*

*Proof.* The automata A and B in the reduction given in Section 3.1 can operate as is on infinite words, ignoring the Halt-Checker gadget of B which is only relevant to finite words.

Since the values of both A and B on an input word w only relate to the prefix u = prefhalt(w) of w until the first halt command, we still have that B(w) > A(w) if u correctly describes a halting run of the two-counter machine M and that B(w) < A(w) if u is finite and does not correctly describe a halting run of M.

#### 382 U. Boker and G. Hefetz

Yet, for infinite words there is also the possibility that the word w does not contain the halt command. In this case, the value of both <sup>A</sup> and the command checker of B will converge to 1, getting A(w) = B(w).

Hence, if M 0-halts, there is a word w, such that B(w) > A(w) and otherwise, for all words w, we have B(w) ≤ A(w).

Observe that for NMDAs, equivalence and non-strict containment are interreducible.

Theorem 3. *Equivalence of (integral) NMDAs on finite as well as infinite words is undecidable. That is, the problem of deciding for given integral NMDAs* A *and* B *on finite or infinite words whether* A(w) = B(w) *for all words* w*.*

*Proof.* Assume toward contradiction the existence of a procedure for equivalence check of A and B. We can use the nondeterminism to obtain an automaton C = A∪B, having C(w) ≤ A(w) for all words w. We can then check whether C is equivalent to A, which holds if and only if A(w) ≤ B(w) for all words w. Indeed, if A(w) ≤ B(w) then A(w) ≤ min(A(w), B(w)) = C(w), while if there exists a word w, such that B(w) < A(w), we have C(w) = min(A(w), B(w)) < A(w), implying that C and A are not equivalent. Thus, such a procedure contradicts the undecidability of non-strict containment, shown in Theorems 1 and 2.

## 4 Comparison of NDAs with Different Discount Factors

We present below our algorithm for the comparison of NDAs with different discount factors. We start with automata on infinite words, and then show how to solve the case of finite words by reduction to the case of infinite words.

The algorithm is based on our main observation that, due to the difference between the discount factors, we only need to consider the combination of the automata computation trees up to some level k, after which we can consider first the best/worst continuation of the automaton with the smaller discount factor, and on top of it the worst/best continuation of the second automaton.

For an NDA A, we define its *lowest* (resp. *highest*) *infinite run value* by lowrun(A) (resp. highrun(A)) = min (resp. max) {A(r) - r is an infinite run of <sup>A</sup> (on some word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>)}.

Observe that we can use min and max (rather than inf and sup) since the infimum and supremum values are indeed attainable by specific infinite runs of the NDA (cf. [10, Proof of Theorem 9]). Notice that lowrun(A) and highrun(A) can be calculated in PTIME by a simple reduction to one-player discountedpayoff games [4].

Considering word values, we also refer to the *lowest* (resp. *highest*) *word value* of <sup>A</sup>, defined by lowword(A) (resp. highword(A))= min (resp. max) { A(w) - <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> }. Observe that lowword(A) = lowrun(A), highword(A) <sup>≤</sup> highrun(A), and for deterministic automaton, highword(A) = highrun(A).

For an NMDA A with states Q, we define the *maximal difference between suffix runs* of <sup>A</sup> as maxdiff(A) = max { highrun(A<sup>q</sup>) <sup>−</sup> lowrun(A<sup>q</sup>) - q ∈ Q }. Notice that maxdiff(A) <sup>≥</sup> <sup>0</sup> and that <sup>A</sup><sup>q</sup>(w) is bounded as follows.

$$\text{LOWRUN}(\mathcal{A}^q) \le \mathcal{A}^q(w) \le \text{LOWRUN}(\mathcal{A}^q) + \text{MAXIDFF}(\mathcal{A}) \tag{2}$$

Lemma 3. *There is an algorithm that computes for every input discount factors* <sup>λ</sup>A, λ<sup>D</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> (1, <sup>∞</sup>)*,* <sup>λ</sup>A*-NDA* <sup>A</sup> *and* <sup>λ</sup>D*-DDA* <sup>D</sup> *on infinite words the value of* min{A(w) − D(w) - <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>}*.*

*Proof.* Consider an alphabet <sup>Σ</sup>, discount factors <sup>λ</sup>A, λ<sup>D</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> (1, <sup>∞</sup>), a <sup>λ</sup>A-NDA A = Σ,QA, ιA, δA, γA and a λD-DDA D = Σ,QD, ιD, δD, γD. When λ<sup>A</sup> = λD, we can generate a λA-NDA C≡A−D over the product of A and D and compute lowword(C).

When λ<sup>A</sup> = λD, we consider first the case that λ<sup>A</sup> < λD.

Our algorithm unfolds the computation trees of A and D, up to a level in which only the minimal-valued suffix words of A remain relevant – Due to the massive difference between the accumulated discount factor in A compared to the one in D, any "penalty" of not continuing with a minimal-valued suffix word in A, defined below as mA, cannot be compensated even by the maximal-valued word of <sup>D</sup>, which "profit" is at most as high as maxdiff(D). Hence, at that level, it is enough to look among the minimal-valued suffixes of A for the one that implies the highest value in D.

For every transition t = (q, σ, q- ) <sup>∈</sup> <sup>δ</sup>A, let minval(q, σ, q- ) = γA(q, σ, q- ) + 1 <sup>λ</sup><sup>A</sup> · lowword(A<sup>q</sup>- ) be the best (minimal) value that <sup>A</sup><sup>q</sup> can get by taking <sup>t</sup> as the first transition. We say that t is *preferred* if it starts a minimal-valued infinite run of <sup>A</sup><sup>q</sup>, namely <sup>δ</sup>pr <sup>=</sup> { <sup>t</sup> = (q, σ, q- ) ∈ δ<sup>A</sup> - minval(t) = lowword(A<sup>q</sup>) } is the set of preferred transitions of <sup>A</sup>. Observe that an infinite run of <sup>A</sup><sup>q</sup> that takes only transitions from <sup>δ</sup>pr, has a value equal to lowrun(A<sup>q</sup>) (cf. [10, Proof of Theorem 9]).

If all the transitions of A are preferred, A has the same value on all words, and then min{A(w)− D(w) - <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>} <sup>=</sup> lowrun(A)−highword(D). (Recall that since <sup>D</sup> is deterministic, we can easily compute highword(D).) Otherwise, let m<sup>A</sup> be the minimal penalty for not taking a preferred transition in A, meaning <sup>m</sup><sup>A</sup> = min minval(t - ) <sup>−</sup> minval(<sup>t</sup> --) - - - t - = (q, σ- , q- ) ∈ δ<sup>A</sup> \ δpr, t -- = (q, σ--, q--) ∈ δpr . Observe that m<sup>A</sup> > 0.

Considering the connection between <sup>m</sup><sup>A</sup> and maxdiff(D), notice first that if maxdiff(D)=0, <sup>D</sup> has the same value on all words, and then we have min{A(w)− D(w) - <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>} <sup>=</sup> lowrun(A)−lowrun(D). Otherwise, meaning maxdiff(D) <sup>&</sup>gt; <sup>0</sup>, we unfold the computation trees of <sup>A</sup> and <sup>D</sup> for the first k levels, until the maximal difference between suffix runs in D, divided by the accumulated discount factor of D, is smaller than the minimal penalty for not taking a preferred transition in A, divided by the accumulated discount factor of A. Meaning, k is the minimal integer such that

$$\frac{\text{MAXDIFF}(\mathcal{D})}{\lambda\_D^{-k}} < \frac{m\_{\mathcal{A}}}{\lambda\_A^{-k}}\tag{3}$$

Starting at level <sup>k</sup>, the penalty gained by taking a non-preferred transition of <sup>A</sup> cannot be compensated by a higher-valued word of D.

At level <sup>k</sup>, we consider separately every run <sup>ψ</sup> of <sup>A</sup> on some prefix word <sup>u</sup>. We should look for a suffix word w, that minimizes

$$\mathcal{A}(uw) - \mathcal{D}(uw) = \mathcal{A}(\psi) + \frac{1}{\lambda\_A{}^k} \cdot \mathcal{A}^{\delta\_{\mathcal{A}}(\psi)}(w) - \mathcal{D}(u) - \frac{1}{\lambda\_D{}^k} \cdot \mathcal{D}^{\delta\_{\mathcal{D}}(u)}(w) \tag{4}$$

A central point of the algorithm is that every word that minimizes A−D must take only preferred transitions of <sup>A</sup> starting at level <sup>k</sup> (full proof in [7]). As all possible remaining continuations after level <sup>k</sup> yield the same value in <sup>A</sup>, we can choose among them the continuation that yields the highest value in D.

Let B be the partial automaton with the states of A, but only its preferred transitions <sup>δ</sup>pr. (We ignore words on which <sup>B</sup> has no runs.) We shall use the automata product <sup>B</sup><sup>δ</sup>A(ψ) × D<sup>δ</sup>D(u) to force suffix words that only take preferred transitions of A, while calculating among them the highest value in D.

Let <sup>C</sup>(δA(ψ),δD(u)) <sup>=</sup> -Σ,Q<sup>A</sup> <sup>×</sup>QD, { (δA(ψ), δD(u)) } , δpr×δD, γC be the partial <sup>λ</sup>D-NDA that is generated by the product of <sup>B</sup><sup>δ</sup>A(ψ) and <sup>D</sup><sup>δ</sup>D(u), while only considering the weights (and discount factor) of <sup>D</sup>, meaning <sup>γ</sup>C((q, p), σ,(q- , p- )) = <sup>γ</sup>D(p, σ, p- ).

A word <sup>w</sup> has a run in <sup>A</sup><sup>δ</sup>A(ψ) that uses only preferred transitions iff <sup>w</sup> has a run in <sup>C</sup>(δA(ψ),δD(u)). Also, observe that the nondeterminism in <sup>C</sup> is only related to the nondeterminism in A, and the weight function of C only depends on the weights of <sup>D</sup>, hence all the runs of <sup>C</sup>(δA(ψ),δD(u)) on the same word result in the same value, which is the value of that word in D. Combining both observations, we get that a word <sup>w</sup> has a run in <sup>A</sup><sup>δ</sup>A(ψ) that uses only preferred transitions iff <sup>w</sup> has a run <sup>r</sup> in <sup>C</sup>(δA(ψ),δD(u)) such that <sup>C</sup>(δA(ψ),δD(u))(r) = <sup>D</sup><sup>δ</sup>D(u) (w). Hence, after taking the <sup>k</sup>-sized run <sup>ψ</sup> of <sup>A</sup>, and under the notations defined in Eq. (4), a suffix word <sup>w</sup> that can take only preferred transitions of <sup>A</sup>, and maximizes <sup>D</sup><sup>δ</sup>D(u)(w), has a value of <sup>D</sup><sup>δ</sup>D(u) (w) = highrun(C(δA(ψ),δD(u))). This leads to

$$\begin{split} & \min \left\{ \mathcal{A}(v) - \mathcal{D}(v) \, \Big| \, v \in \Sigma^{\omega} \right\} = \\ & \min \left\{ \mathcal{A}(\psi) + \frac{\mathcal{A}^{\delta\_{\mathcal{A}}(\psi)}(w)}{\lambda\_{A}{}^{k}} - \mathcal{D}(u) - \frac{\mathcal{D}^{\delta\_{\mathcal{D}}(u)}(w)}{\lambda\_{D}{}^{k}} \Big| \, u \in \Sigma^{k}, w \in \Sigma^{\omega}, \\ & \min\_{\psi} \left\{ \mathcal{A}(\psi) + \frac{\text{LOWRUN}(\mathcal{A}^{\delta\_{\mathcal{A}}(\psi)})}{\lambda\_{A}{}^{k}} - \mathcal{D}(u) - \frac{\text{HIGHRUN}(\mathcal{C}^{\{\delta\_{\mathcal{A}}(\psi),\delta\_{\mathcal{D}}(u)\}})}{\lambda\_{D}{}^{k}} \Big| \, \psi \text{ is a run} \right\} \end{split}$$

and it is only left to calculate this value for every <sup>k</sup>-sized run of <sup>A</sup>, meaning for every leaf in the computation tree of A.

The case of λ<sup>A</sup> > λ<sup>D</sup> is analogous, with the following changes:


and the minimal penalty <sup>m</sup><sup>D</sup> is 

<sup>m</sup><sup>D</sup> = min maxval(t --) <sup>−</sup> maxval(<sup>t</sup> - ) t -- = (p, σ--, p--) <sup>∈</sup> <sup>δ</sup>pr, t - = (p, σ- , p- ) <sup>∈</sup> <sup>δ</sup><sup>D</sup> \ <sup>δ</sup>pr – <sup>k</sup> should be the minimal integer such that maxdiff(A) <sup>λ</sup>A<sup>k</sup> <sup>&</sup>lt; <sup>m</sup><sup>D</sup> <sup>λ</sup>D<sup>k</sup> .

– We define B to be the restriction of D to its preferred transitions, and <sup>C</sup>(δA(ψ),δD(u)) as a partial <sup>λ</sup>A-NDA on the product of <sup>A</sup><sup>δ</sup>A(ψ) and <sup>B</sup><sup>δ</sup>D(u) while considering the weights of <sup>A</sup>. We then calculate lowrun(C(δA(ψ),δD(u))) for every <sup>k</sup>-sized run of <sup>A</sup>, <sup>ψ</sup>, and conclude that min {A−D} is equal to min<sup>ψ</sup> { A(ψ) + lowrun(C(δA(ψ),δD(u))) <sup>λ</sup>A<sup>k</sup> − D(u) <sup>−</sup> highrun(DδD(u)) <sup>λ</sup>D<sup>k</sup> }. Observe that in this case, it might not hold that all runs of <sup>C</sup>(δA(ψ),δD(u)) on the same word have the same value, but such property is not required, since we look for the minimal run value (which is the minimal word value).

Notice that the algorithm of Lemma 3 does not work if switching the direction of containment, namely if considering a deterministic A and a nondeterministic D. The determinism of D is required for finding the maximal value of a valid word in <sup>B</sup><sup>δ</sup>A(ψ) × D<sup>δ</sup>D(u). If <sup>D</sup> is not deterministic, the maximal-valued run of <sup>B</sup><sup>δ</sup>A(ψ) × D<sup>δ</sup>D(u) on some word <sup>w</sup> equals the value of some run of <sup>D</sup> on <sup>w</sup>, but not necessarily the value of <sup>D</sup> on <sup>w</sup>. We also need <sup>D</sup> to be deterministic for computing highword(D<sup>p</sup>) in the case that <sup>λ</sup><sup>A</sup> > λD.

Moving to automata on finite words, we reduce the problem to the corresponding problem handled in Lemma 3, by adding to the alphabet a new letter that represents the end of the word, and making some required adjustments.

Lemma 4. *There is an algorithm that computes for every input discount factors* <sup>λ</sup>A, λ<sup>D</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> (1, <sup>∞</sup>)*,* <sup>λ</sup>A*-NDA* <sup>A</sup> *and* <sup>λ</sup>D*-DDA* <sup>D</sup> *on finite words the value of* inf { A(u) − D(u) <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> }*, and determines if there exists a finite word* <sup>u</sup> *for which* <sup>A</sup>(u) − D(u) *equals that value.*

*Proof.* Without loss of generality, we assume that initial states of automata have no incoming transitions. (Every automaton can be changed in linear time to an equivalent automaton with this property.)

We convert, as described below, an NDA N on finite words to an NDA <sup>N</sup><sup>ˆ</sup> on infinite words, such that <sup>N</sup><sup>ˆ</sup> intuitively simulates the finite runs of <sup>N</sup> . For an alphabet <sup>Σ</sup>, a discount factor <sup>λ</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> (1, <sup>∞</sup>), and a <sup>λ</sup>-NDA (DDA) <sup>N</sup> <sup>=</sup> Σ,Q<sup>N</sup> , ι<sup>N</sup> , δ<sup>N</sup> , γ<sup>N</sup> on finite words, we define the <sup>λ</sup>-NDA (DDA) <sup>N</sup><sup>ˆ</sup> <sup>=</sup> Σ,Q <sup>ˆ</sup> <sup>N</sup> ∪ { <sup>q</sup><sup>τ</sup> } , ι<sup>N</sup> , δN<sup>ˆ</sup> , γN<sup>ˆ</sup> on infinite words. The new alphabet <sup>Σ</sup><sup>ˆ</sup> <sup>=</sup> <sup>Σ</sup> ∪ { <sup>τ</sup> } contains a new letter τ /<sup>∈</sup> <sup>Σ</sup> that indicates the end of a finite word. The new state q<sup>τ</sup> has 0-valued self loops on every letter in the alphabet, and there are 0 valued transitions from every non-initial state to q<sup>τ</sup> on the new letter τ . Formally, δ<sup>N</sup><sup>ˆ</sup> <sup>=</sup> <sup>δ</sup><sup>N</sup> ∪ { (q<sup>τ</sup> , σ, q<sup>τ</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>ˆ) }∪{ (q, τ, q<sup>τ</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>N</sup> \ <sup>ι</sup><sup>N</sup> ) }, and <sup>γ</sup><sup>N</sup> (t) <sup>t</sup> <sup>∈</sup> <sup>δ</sup><sup>N</sup>

γ<sup>N</sup><sup>ˆ</sup> (t) = 0 otherwise

Observe that for every state <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>N</sup> , the following hold.


Hence, for every <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>N</sup> we have inf { N <sup>q</sup>(r) - <sup>r</sup> is a run of <sup>N</sup> <sup>q</sup> } <sup>=</sup> lowrun(N<sup>ˆ</sup> <sup>q</sup>) and sup { N <sup>q</sup>(r) - <sup>r</sup> is a run of <sup>N</sup> <sup>q</sup> } <sup>=</sup> highrun(N<sup>ˆ</sup> <sup>q</sup>). (For a non-initial state <sup>q</sup>, we also consider the "run" of <sup>N</sup> <sup>q</sup> on the empty word, and define its value to be <sup>0</sup>.) Notice that the infimum (supremum) run value of <sup>N</sup> <sup>q</sup> is attained by an actual run of <sup>N</sup> <sup>q</sup> iff there is an infinite run of <sup>N</sup><sup>ˆ</sup> <sup>q</sup> that gets this value and takes a τ transition.

For every state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>N<sup>ˆ</sup> , we can determine, as follows, whether lowrun(N<sup>ˆ</sup> <sup>q</sup>) is attained by an infinite run taking a <sup>τ</sup> transition. We calculate lowrun(N<sup>ˆ</sup> <sup>q</sup>) for all states, and then start a process that iteratively marks the states of <sup>N</sup><sup>ˆ</sup> , such that at the end, <sup>q</sup> <sup>∈</sup> <sup>Q</sup>N<sup>ˆ</sup> is marked iff lowrun(N<sup>ˆ</sup> <sup>q</sup>) can be achieved by a run with a τ transition. We start with q<sup>τ</sup> as the only marked state. In each iteration we further mark every state q from which there exists a preferred transition t = (q, σ, q- ) ∈ δpr to some marked state q- . The process terminates when an iteration has no new states to mark. Analogously, we can determine whether highrun(N<sup>ˆ</sup> <sup>q</sup>) is attained by a run that goes to <sup>q</sup><sup>τ</sup> .

Consider discount factors <sup>λ</sup>A, λ<sup>D</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup>(1, <sup>∞</sup>), a <sup>λ</sup>A-NDA <sup>A</sup> and a <sup>λ</sup>D-DDA D on finite words. When λ<sup>A</sup> = λD, similarly to Lemma 3, the algorithm finds the infimum value of C ≡A−D using <sup>C</sup>ˆ, and determines if an actual finite word attains this value using the process described above.

Otherwise, the algorithm converts <sup>A</sup> and <sup>D</sup> to <sup>A</sup><sup>ˆ</sup> and <sup>D</sup>ˆ, and proceeds as in Lemma <sup>3</sup> over <sup>A</sup><sup>ˆ</sup> and <sup>D</sup>ˆ. According to the above observations, we have that inf { A(u) − D(u) - <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> } = min{Aˆ(w) <sup>−</sup> <sup>D</sup>ˆ(w) - <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>}, and that inf { A(u) − D(u) } is attainable iff min{Aˆ(w)−Dˆ(w)} is attainable by some word that has a τ transition. Hence, whenever computing lowrun or highrun, we also perform the process described above, to determine whether this value is attainable by a run that has a τ transition. We determine that inf { A(u) − D(u) } is attainable iff exists a leaf of the computation tree that leads to it, for which the relevant values lowrun and highrun are attainable.

Complexity analysis We show below that the algorithm of Lemmas 3 and 4 only needs a polynomial space, with respect to the size of the input automata, implying a PSPACE algorithm for the corresponding decision problems. We define the size of an NDA N , denoted by |N |, as the maximum between the number of its transitions, the maximal binary representation of any weight in it, and the maximal unary representation of the discount factor. (Binary representation of the discount factors might cause our algorithm to use an exponential space, in case that the two factors are very close to each other.) The input NDAs may have rational weights, yet it will be more convenient to consider equivalent NDAs with integral weights that are obtained by multiplying all the weights by their common denominator [6]. (Observe that it causes the values of all words to be multiplied by this same ratio, and it keeps the same input size, up to a polynomial change.)

Before proceeding to the complexity analysis, we provide an auxiliary lemma (proof appears in [7]).

Lemma 5. For every integers p>q <sup>∈</sup> <sup>N</sup>\{0}, a <sup>p</sup> <sup>q</sup> -NDA A with integral weights, and a lasso run <sup>r</sup> <sup>=</sup> <sup>t</sup>0, t1,...,t<sup>x</sup>−<sup>1</sup>,(tx, tx+1,...,tx+y−<sup>1</sup>)<sup>ω</sup> of <sup>A</sup>, there exists an integer <sup>b</sup>, such that <sup>A</sup>(r) = <sup>b</sup> <sup>p</sup>x(py−qy) .

Proceeding to the complexity analysis, let the input size be S = |A| + |D|, the reduced forms of <sup>λ</sup><sup>A</sup> and <sup>λ</sup><sup>D</sup> be <sup>p</sup> <sup>q</sup> and <sup>p</sup><sup>D</sup> <sup>q</sup><sup>D</sup> respectively, the number of states in A be n, and the maximal difference between transition weights in D be M. Observe that <sup>n</sup> <sup>≤</sup> S, p <sup>≤</sup> S, M <sup>≤</sup> <sup>2</sup> · <sup>2</sup><sup>S</sup>, <sup>λ</sup><sup>D</sup> <sup>λ</sup>D−<sup>1</sup> <sup>≤</sup> <sup>p</sup><sup>D</sup> <sup>p</sup>D−q<sup>D</sup> <sup>≤</sup> <sup>p</sup><sup>D</sup> <sup>≤</sup> <sup>S</sup>, and for <sup>λ</sup><sup>D</sup> > λ<sup>A</sup> <sup>&</sup>gt; <sup>1</sup>, we also have <sup>λ</sup><sup>D</sup> <sup>λ</sup><sup>A</sup> <sup>=</sup> <sup>p</sup>·q<sup>D</sup> <sup>q</sup>·p<sup>D</sup> <sup>≥</sup> 1 + <sup>1</sup> <sup>S</sup><sup>2</sup> .

Observe that A has a best infinite run (and D has a worst infinite run), in a lasso form as in Lemma 5, with x, y ∈ [1..n]. Indeed, following preferred transitions, a run must complete a lasso, and then may forever repeat its choices of preferred transitions. Hence, mA, being the difference between two lasso runs, is in the form of

$$\begin{aligned} m\_{\mathcal{A}} &= \frac{b\_1}{p^{x\_1}(p^{y\_1} - q^{y\_1})} - \frac{b\_2}{p^{x\_2}(p^{y\_2} - q^{y\_2})} = \frac{b\_3}{p^n(p^{y\_1} - q^{y\_1})(p^{y\_2} - q^{y\_2})} > \frac{b\_3}{p^n p^{y\_1} p^{y\_2}} \\ &\ge \frac{1}{p^{3n}} \ge \frac{1}{S^{3S}} \overset{\text{for } S \ge 1}{>} \frac{1}{(2^S)^{3S}} = \frac{1}{2^{3S^2}} \end{aligned}$$

for some x1, x2, y1, y<sup>2</sup> ≤ n and some integers b1, b2, b3. (Similarly, we can show that <sup>m</sup><sup>D</sup> <sup>&</sup>gt; <sup>1</sup> <sup>2</sup>3S<sup>2</sup> .) We have maxdiff(D) <sup>≤</sup> <sup>M</sup> · <sup>λ</sup><sup>D</sup> <sup>λ</sup>D−<sup>1</sup> , hence

$$\frac{\text{MAXDIFF}(\mathcal{D})}{m\_{\mathcal{A}}} \le \frac{M \cdot \frac{\lambda\_{\mathcal{D}}}{\lambda\_{\mathcal{D}} - 1}}{m\_{\mathcal{A}}} \le \frac{2^{1 + S} \cdot S}{m\_{\mathcal{A}}} \stackrel{\text{(for } S \ge 1)}{<} \frac{2^{3S}}{m\_{\mathcal{A}}} < 2^{3S + 3S^2}$$

Recall that we unfold the computation tree until level k, which is the minimal integer such that ( <sup>λ</sup><sup>D</sup> <sup>λ</sup><sup>A</sup> )<sup>k</sup> <sup>&</sup>gt; maxdiff(D) <sup>m</sup><sup>A</sup> . Observe that for <sup>S</sup> <sup>≥</sup> <sup>1</sup> we have λ<sup>D</sup> λ<sup>A</sup> S2 <sup>≥</sup> - 1 + <sup>1</sup> S<sup>2</sup> S2 <sup>≥</sup> <sup>2</sup>, hence for <sup>k</sup> <sup>=</sup> <sup>S</sup><sup>2</sup> · (3<sup>S</sup> + 3S<sup>2</sup>), we have

$$\left(\frac{\lambda\_D}{\lambda\_A}\right)^{k'} = \left(\frac{\lambda\_D}{\lambda\_A}\right)^{3S + 3S^2} \ge 2^{3S + 3S^2} > \frac{\text{MAXDIFF}(\mathcal{D})}{m\_{\mathcal{A}}}$$

meaning that k is polynomial in S. Similar analysis shows that k is polynomial in S also for λ<sup>D</sup> < λA.

Considering decision problems that use our algorithm, due to the equivalence of NPSPACE and PSPACE, the algorithm can nondeterministically guess an optimal prefix word u of size k, letter by letter, as well as a run ψ of A on u, transition by transition, and then compute the value of <sup>A</sup>(ψ)+lowrun(AδA(ψ)) <sup>λ</sup>A<sup>k</sup> − <sup>D</sup>(u) <sup>−</sup> highrun(C(δA(ψ),δD(u))) <sup>λ</sup>D<sup>k</sup> .

Observe that along the run of the algorithm, we need to save the following information, which can be done in polynomial space:


We thus get the following complexity result.

Theorem 4. *For input discount factors* <sup>λ</sup>A, λ<sup>D</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> (1, <sup>∞</sup>)*,* <sup>λ</sup>A*-NDA* <sup>A</sup> *and* λD*-DDA* D *on finite or infinite words, it is decidable in PSPACE whether* A(w) ≥ D(w) *and whether* A(w) > D(w) *for all words* w*.*

*Proof.* We use Lemma 3 in the case of infinite words and Lemma 4 in the case of finite words, checking whether min { A(w) − D(w) } < 0 and whether min { A(w) − D(w) } ≤ 0. In the case of finite words, we also use the information of whether there is an actual word that gets the desired value. 

Since integral NDAs can always be determinized [8], we get as a corollary that there is an algorithm to decide equivalence and strict and non-strict containment of integral NDAs with different (or the same) discount factors. Note, however, that it might not be in PSPACE, since determinization exponentially increases the number of states, resulting in k that is exponential in S, and storing in binary representation values in the order of λ<sup>k</sup> might require exponential space.

Corollary 1. *There are algorithms to decide for input integral discount factors* <sup>λ</sup>A, λ<sup>B</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>λ</sup>A*-NDA* <sup>A</sup> *and* <sup>λ</sup>B*-NDA* <sup>B</sup> *on finite or infinite words whether or not* A(w) > B(w)*,* A(w) ≥ B(w)*, or* A(w) = B(w) *for all words* w*.*

## 5 Conclusions

The new decidability result, providing an algorithm for comparing discountedsum automata with different integral discount factors, may allow to extend the usage of discounted-sum automata in formal verification, while the undecidability result strengthen the justification of restricting discounted-sum automata with multiple integral discount factors to tidy NMDAs. The new algorithm also extends the possible, more limited, usage of discounted-sum automata with rational discount factors, while further research should be put into this direction.

Acknowledgements We thank Guillermo A. Perez for stimulating discussions on the comparison of integral NDAs with different discount factors.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Fast Matching of Regular Patterns with Synchronizing Counting

Luka´s Hol ˇ ´ık , Juraj S´ıcˇ() , Lenka Turonov ˇ a´ , and Toma´s Vojnar ˇ

Brno University of Technology, Brno, Czech Republic {holik,sicjuraj,ituronova,vojnar}@fit.vut.cz

Abstract. Fast matching of regular expressions with *bounded repetition*, aka *counting*, such as (ab){50,100}, i.e., matching linear in the length of the text and independent of the repetition bounds, has been an open problem for at least two decades. We show that, for a wide class of regular expressions with counting, which we call *synchronizing*, fast matching is possible. We empirically show that the class covers nearly all counting used in usual applications of regex matching. This complexity result is based on an improvement and analysis of a recent matching algorithm that compiles regexes to deterministic counting-set automata (automata with registers that hold sets of numbers).

## 1 Introduction

Fast matching of regular expressions with *bounded repetition*, aka *counting*, has been an open problem for at least two decades (cf., e.g., [33]). The time complexity of the standard matching algorithms run on a regex such as .\*a.{100} is, at best, dominated by the *length of the text multiplied by the repetition bounds*. This makes matching prone to unacceptable slowdowns since the length of the text as well as the repetition bounds are often large. In this paper, we provide a theoretical basis for matching of bounded repetition with a much more reliable performance. We show that a large and practical class of regexes with counting theoretically allows fast matching—in time independent of the counter bounds and linear in the length of the text.

The problem also has a strong practical motivation. Regex matching is used for searching, data validation, detection of information leakage, parsing, replacing, data scraping, syntax highlighting, etc. It is natively supported in most programming languages [6], and ubiquitous (used in 30–40 % of Java, JavaScript, and Python software [7,39,8,5]). Efficiency and predictability of regex matching is important. An extreme run-time of matching can have serious consequences, such as a failed input validation against injection attacks [41] and events like the outage of Cloudflare services [18]. Regexes vulnerabilities are also a doorway for the *ReDoS (regular expression denial of service) attack*, in which the attacker crafts a text to overwhelm a matcher (as, e.g., in the case of the outage of StackOverflow [13] or the websites exposed due to their use of the popular Express.js framework [3]). ReDoS has been widely recognized as a common and serious threat [7,9,11], with counting in regexes begin especially dangerous [37].

*Matching algorithms and complexity.* The potential instability of the pattern matchers is in line with the worst-case complexity of the matching algorithms. The most widely used approach to matching is backtracking (used, e.g., in standard matchers of .NET, Python, Perl, PHP, Java, JavaScript, Ruby) for its simplicity and ease of implementation of advanced features such as back-references or look-arounds. It is, however, at worst exponential to the length of the matched text and prone to ReDoS. Even though this can be improved, for instance by memoization [11], the fastest matchers used in performance critical applications all use automata-based algorithms instead of backtracking. The basis of these approaches is Thompson's algorithm [35] (also referred to as *online NFA-simulation*). Together with many optimizations, it is implemented in Intel's Hyperscan [40]. When combined with caching, it becomes the on-the-fly subset construction of a DFA, also called *online DFA-simulation* (implemented in RE2 from Google, GNU grep, SRM, or the standard matcher of Rust [17,19,30,12]). Without counting, the major factor in the worst-case complexity is *O*(*nm*2), with *n* being the length of the text and *m* the size of the number of character occurrences in the regex (*m* is smaller than size of the regex, the length of string defining it). We say that the *character cost*, i.e., the cost of extending the text with one character, is *m*2. This is the cost of iterating through transitions of an NFA with *O*(*m*) states and *O*(*m*2) transitions compiled from the regex by some classical construction [2,16,24].

Extending the syntax of regexes with *bounded quantifiers* (or *counters*), such as (ab){50,100}, increases the character complexity dramatically. Given *k* counters with the maximum bound -, the number of NFA states rises to *O*(*m<sup>k</sup>*), the number of transitions as well as the character cost to *O*((*m<sup>k</sup>*)2). For instance, the minimal DFA for .\*a.{k} (i.e., *<sup>a</sup>* appears *<sup>k</sup>* characters from the end) has more than 2*<sup>k</sup>* states. Moreover, note that, since *k* is written as a decadic numeral, its value is exponential in the size of the regex. This makes matching with already moderately high *k* prone to significant slowdowns and ReDoS vulnerabilities with virtually every mainstream matcher (see [36,37]). At the same time, repetition bounds easily reach thousands, in extreme tens of millions (in real-life XML [4]). Writing a dangerous counting expression is easy and it is hard to identify. Security-critical solutions may be vulnerable to counting-related ReDoS [37] despite an extra effort spent in regex design and testing, hence developers sometimes avoid counting, use workarounds and restrict functionality.

The problem of matching with bounded repetition has been addressed from the theoretical as well as from the practical perspective by a number of authors [15,4,22,26,31,20,25,36]. From these, the recent work [36] is the only one offering fast matching for a practically significant class of regexes. The algorithm of [36] compiles a regex with counting to a non-deterministic *counting automaton (CA)*, an automaton with counters that can be incremented, reset, and compared with a constant. The crux of the problem is then to convert the CA to a succinct deterministic machine that could be simulated fast in matching. The work [36] achieves this by determinizing the CA into a *counting-set automaton (CSA)*, an automaton with registers that hold *sets* of numbers. Its size is independent of the counter bounds and it updates the sets by a handful of operations that are all constant time, regardless the size of the sets. However, regexes outside the supported class do appear, the class has no syntactic characterization, and it is hard to recognize (as demonstrated also by an incorrect proposal of a syntactic class in [36] itself). For instance, .\*a{5} or (ab){5} are handled, but .\*(aa){5} or .\*(ab){5} are not (the requirement is technical, see Section 4).

*Our contribution.* In this paper, we

### 1. generalize the algorithm of [36] to extend the class of handled regexes and

2. derive a useful syntactic characterization of the extended class.

The derived class is characterized by *flat counting* (counting operators are not nested) where repetitions of each counted expression *R* are *synchronizing* (a word from *Rn* cannot have a prefix from *Rn*+1). It is the first clearly delimited practical class of regexes with counting that allows fast matching. It includes the easily recognizable and frequent case where every word in *R* has exactly one occurrence of a *marker*, a letter or a word from a finite set of markers that unambiguously identifies each occurrence of *R* (note that even this simple class was not handled by any previous fast algorithms, including [36]). In a our experiment with a large set of regexes from various sources, 99.6 % of non-trivial flat counting was synchronizing and 99.2 % was letter-marked.

To obtain the results (1) and (2) above, we first modify the determinization of [36] to include the entire class of regexes with flat counting. In a nutshell, this is achieved by two changes: (i) We allow copying and uniting of sets stored in registers, and (ii) in the determinization, we index counters of the CA by its states to handle CA in which nondeterministic runs that reach different states reach different counter values.

These modifications come with the main technical challenge that we solve in this paper: copying and uniting sets is not constant-time but linear to the size of the sets. This would make the character cost linear in the counter bound again. To remove the dependency on the counter bounds, we augment the determinization by optimizations that avoid the copying and uniting. First, to alleviate the cost of uniting, we store intersections of sets stored in registers in new shared registers, so that the intersection does not contribute to the cost of uniting the registers. Then, to increase the impact of intersection sharing, we synchronize register updates in order to make their intersections larger. We then show that if the CSA *does not replicate registers*, i.e, each register can in a transition appear on the right-hand side of only one register assignment, then it never copies registers and the cost of unions can be amortised. Finally, we define the class of regexes with *synchronizing counting* for which the optimized CsA do not replicate counters so their simulation in matching is fast.

*Related work.* In the context of regex matching, counting automata were used in several forms under several names (e.g. [20,36,4,15,31,32,33,14,23]). Besides [36] discussed above, other solutions to matching of counting regexes [15,4,22,26,31,20,25] handle small classes of regexes or do not allow matching linear in the text size and independent of counter bounds. The work [20] proposes a CA-to-CA determinization producing smaller automata than the explicit CA determinization for the limited class of monadic regexes, covered by letter-marked counting, and the size of their deterministic automata is still dependent on the counter bounds. The work [4] uses a notion of automata with counters of [15]. It focuses mostly on deterministic regexes, a class much smaller than regexes with synchronizing counting, and proposes a matching algorithm still dependent on the counter bounds. The paper [25] proposes an algorithm that takes time at worst quadratic to the length of the text. Extended FA (XFA) of [31,32] augment NFA with a scratch memory of bits that can represent counters, and their determinization is exponential in counter bounds already for regexes such as .\*a.{*k*}. The *counter-1 unambiguous* regexes of [22,23] can be directly compiled into deterministic automata called FACs, similar to our CA, independent of counter bounds, but the class is limited, excluding e.g., .\*a.{*k*}.

## 2 Preliminaries

We use N to denote the natural numbers including 0. For a set *S*, *P*(*S*) denotes its powerset and *P*fin(*S*) is the set of all *finite* subsets of *S*.

A *first order language (f.o.l.)* Γ = (*F*,*P*) consists of a set of *function symbols F* and a set of *predicate symbols P*. An *interpretation* I of Γ with a *domain D*<sup>I</sup> assigns a function *f* <sup>I</sup> : *D<sup>n</sup>* <sup>I</sup> <sup>→</sup> *<sup>D</sup>*<sup>I</sup> to each *<sup>n</sup>*-ary *<sup>f</sup>* <sup>∈</sup> *<sup>F</sup>* and a function *<sup>p</sup>*<sup>I</sup> : *<sup>D</sup><sup>n</sup>* <sup>I</sup> → {0,1} to each *<sup>n</sup>*-ary *<sup>p</sup>* <sup>∈</sup> *<sup>P</sup>*. An *assignment* of a set of variables *<sup>X</sup>* in <sup>I</sup> is a total function <sup>ν</sup> : *<sup>X</sup>* <sup>→</sup> *<sup>D</sup>*I. The set of *terms* TermsΓ,*<sup>X</sup>* and the set QFFΓ,*<sup>X</sup>* of *quantifier free formulae* (boolean combinations of atomic formulae) over Γ and *X*, as well as the interpretation of a term, *t* I (ν), and a formula, ϕ<sup>I</sup> (ν), are defined as usual. We denote by ν |=<sup>I</sup> ϕ that the formula ϕ is *satisfied* (interpreted as true) by the assignment ν. It is then *satisfiable*. We drop the sub/superscript I when it is clear from the context. We write ϕ[*x*] and *t*[*x*] to denote a unary formula ϕ or term *t*, respectively, with the free variable *x*, and we may also abuse this notation to denote the term/formula with its only free variable replaced by *x*. We write *t* I (*k*) and ϕ<sup>I</sup> (*k*) to denote the values *t* I ({*<sup>x</sup>* → *<sup>k</sup>*}) and <sup>ϕ</sup><sup>I</sup> ({*<sup>x</sup>* → *<sup>k</sup>*}). For a set of formulae <sup>Ψ</sup> <sup>=</sup> {ψ1,...,ψ*n*}, the set *Minterms*(Ψ) consists of all *minterms* of <sup>Ψ</sup>, satisfiable conjunctions <sup>ϕ</sup><sup>1</sup> ∧···∧ϕ*<sup>n</sup>* where for each *<sup>i</sup>* : 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*, <sup>ϕ</sup>*<sup>i</sup>* is <sup>ψ</sup>*<sup>i</sup>* or <sup>¬</sup>ψ*i*.

We fix a finite *alphabet* Σ of *symbols/letters* for the rest of the paper. Words are sequences of letters, with the *empty word* ε. The *concatenation* of words *u* and *v* is denoted *<sup>u</sup>* · *<sup>v</sup>*, *uv* for short. A set of words over <sup>Σ</sup> is a *language*, the concatenation of languages is *<sup>L</sup>*·*L* <sup>=</sup> {*<sup>u</sup>* · *<sup>v</sup>* <sup>|</sup> *<sup>u</sup>* <sup>∈</sup> *<sup>L</sup>*∧*<sup>v</sup>* <sup>∈</sup> *<sup>L</sup>* }, *LL* for short. *Bounded iteration x<sup>i</sup>* , *<sup>i</sup>* <sup>∈</sup> <sup>N</sup>, of a word or a language *<sup>x</sup>* is defined by *<sup>x</sup>*<sup>0</sup> <sup>=</sup> <sup>ε</sup> for a word, *<sup>x</sup>*<sup>0</sup> <sup>=</sup> {ε} for a language, and *<sup>x</sup>i*+<sup>1</sup> <sup>=</sup> *<sup>x</sup><sup>i</sup>* · *<sup>x</sup>*. Then *x*<sup>∗</sup> = - *<sup>i</sup>*∈<sup>N</sup> *<sup>x</sup><sup>i</sup>* . We consider a usual basic syntax of *regular expressions (regexes)*, generated by the grammar *<sup>R</sup>* ::<sup>=</sup> <sup>ε</sup> <sup>|</sup> <sup>a</sup> <sup>|</sup> (*R*) <sup>|</sup> *RR* <sup>|</sup> *<sup>R</sup>*|*<sup>R</sup>* <sup>|</sup> *<sup>R</sup>*\* <sup>|</sup> *<sup>R</sup>*{*m*,*n*} where *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> <sup>∪</sup> <sup>∞</sup>, 0 <sup>≤</sup> *<sup>m</sup>*, 0 <sup>&</sup>lt; *<sup>n</sup>*, *<sup>m</sup>* <sup>≤</sup> *<sup>n</sup>*, and <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>. We use *<sup>R</sup>*{*m*} for *<sup>R</sup>*{*m*,*m*}. Regexes containing a sub-expression with the *counter R*{*m*,*n*} or *<sup>R</sup>*{*m*} are called *counting regexes* and *m*,*n* are *counter bounds*. We denote by max*<sup>R</sup>* the maximum integer occurring in the counter bounds of regex *R* and we denote the number of counters by *cntR*. A regex with *flat counting* does not have nested counting, that is, in a sub-regex *<sup>S</sup>*{*m*,*n*}, *<sup>S</sup>* cannot contain counting. The *language* of a regex *R* is constructed inductively to the structure: *<sup>L</sup>*(ε) = {ε}, *<sup>L</sup>*(a) = {*a*} for *<sup>a</sup>* <sup>∈</sup> <sup>Σ</sup>, *<sup>L</sup>*(*RR* ) = *<sup>L</sup>*(*R*)· *<sup>L</sup>*(*R* ), *L*(*R*\*) = *L*(*R*)∗, *L*(*R*|*R* ) = *<sup>L</sup>*(*R*)∪*L*(*R* ), and *<sup>L</sup>*(*R*{*m*,*n*}) = - *<sup>m</sup>*≤*i*≤*<sup>n</sup> <sup>L</sup>*(*R*)*<sup>i</sup>* . We understand <sup>|</sup>*R*<sup>|</sup> simply as the length of the defining string, e.g. |(ab){10}| = 8. We define -*R* as the number of character occurrences in *R*, formally, *<sup>a</sup>* <sup>=</sup> 1 for *<sup>a</sup>* <sup>∈</sup> <sup>Σ</sup>, ε = 0, -(*R*) = -*<sup>R</sup>*{m,n} <sup>=</sup> -*R*, and -*<sup>R</sup>*· *<sup>S</sup>* <sup>=</sup> -*R*|*S* = -*R*+-*S*.

A *(nondeterministic) automaton (NA)* is a tuple *A* = (*Q*,Δ,*I*,*F*) where *Q* is a set of *states*, <sup>Δ</sup> is a set of *transitions* of the form *<sup>q</sup>*−{*a*→} *<sup>r</sup>* with *<sup>q</sup>*,*<sup>r</sup>* <sup>∈</sup> *<sup>Q</sup>* and *<sup>a</sup>* <sup>∈</sup> <sup>Σ</sup>, *<sup>I</sup>* <sup>⊆</sup> *<sup>Q</sup>* is the set of *initial states*, and *F* ⊆ *Q* is the set of *final states*. A run of *A* over a word *w* = *<sup>a</sup>*<sup>1</sup> ...*an* from state *<sup>p</sup>*<sup>0</sup> to *pn*, *<sup>n</sup>* <sup>≥</sup> 0 is a sequence of transitions *<sup>p</sup>*<sup>0</sup> −{*a*1→} *p*1, *p*<sup>1</sup> −{*a*2→} *p*2, ..., *pn*−<sup>1</sup> −{*an*→} *pn* from Δ. The empty sequence is a run with *p*<sup>0</sup> = *pn* over ε. The run is *accepting* if *p*<sup>0</sup> ∈ *I* and *pn* ∈ *F*, and the language *L*(*A*) of *A* is the set of all words for which *A* has an accepting run. A state *q* is *reachable* if there is a run from *I* to it. The *size* of the NA, |*A*|, is defined as the number of its states plus the number of its transitions. The automaton is *deterministic (DA)* iff |*I*| = 1 and for every state *q* and symbol *a*, Δ has at most one transition *q*−{*a*→} *r*. The *subset construction* transforms the NA to the DA with the same language DA(*A*)=(*Q*{},Δ{},*I*{},*F*{}) where *<sup>Q</sup>*{} <sup>⊆</sup> *<sup>P</sup>*(*Q*) and <sup>Δ</sup>{} are the smallest sets of states and transitions satisfying *I*{} = {*I*}, Δ{} has for each *a* ∈ Σ and each *S* ∈ *Q*{} the transition *S*−{*a*→} {*s* | *s* ∈ *S*∧*s*−{*a*→} *s* ∈ Δ}, and *F*{} = {*S* ∈ *Q*{} | *S*∩*F* = 0/}. When the set of states *Q* is finite, we talk about (deterministic) *finite state* automata (NFA, DFA).1

This paper is concerned with the problem of fast *pattern matching*, basically a membership test: given a regex *R* and a text *w*, decide whether *w* ∈ *L*(*R*). While *w* may be very long, *R* is normally small, hence the dependence on |*w*| is the major factor in the complexity. The offline DFA simulation takes time linear in |*w*|. It (1) compiles *R* into an NFA NFA(*R*) (2) determinizes it, and (3) follows the DFA run over *w* (aka *simulates* the DFA on *<sup>w</sup>*), all in time and space <sup>Θ</sup>(2|NFA(*R*)<sup>|</sup> <sup>+</sup> <sup>|</sup>*w*|). The cost of determinization, exponential in |NFA(*R*)|, is however too impractical. Modern matchers such as Grep or RE2 [19,17] therefore use the techniques of online DFA simulation, where only the part of the DFA used for processing *w* is constructed. It reduces the complexity to *<sup>O</sup>*(min(2|NFA(*R*)<sup>|</sup> <sup>+</sup> <sup>|</sup>*w*|,|*w*|·|NFA(*R*)|)) (the first operand of min is the explicit determinization in case the entire DFA is constructed, plus the cost of DFA-simulation; the second operand is the cost of the online-DFA simulation, coming from that every step may incur construction of a new DFA state and transition in time *O*(|NFA(*R*)|)). For counting regexes, the factor |NFA(*R*)| depends linearly (or more if counting is nested) on max*<sup>R</sup>* and thus exponentially on |*R*|. This makes counting very problematic in practice [36,37,33]. We will present a matching algorithm which is *fast* for a specific class of regexes, meaning that its run-time is still linear in |*w*| but is independent of max*R*.

## 3 Counting Automata

We use a rephrased definition of counting automata and counting-set automata of [36]. We will present them as a special case of a generic notion of automata with registers.

Definition 1 (Automata with registers). *An* automaton with registers *(RA) operated through an f.o.l.* <sup>Γ</sup> *under an interpretation* <sup>I</sup> *is a tuple A* = (*X*,*Q*,Δ,*I*,*F*) *where X is a set of variables called* registers*; Q is a finite set of* states*;* Δ *is a finite set of* transitions *of the form q*−{*a*,ϕ,*u*→} *p where p*,*<sup>q</sup>* <sup>∈</sup> *Q, a* <sup>∈</sup> <sup>Σ</sup>*, u* : *<sup>X</sup>* <sup>→</sup> TermsΓ,*<sup>X</sup> is an* update*, and* <sup>ϕ</sup> <sup>∈</sup> QFFΓ,*<sup>X</sup> is a* guard*; I is a set of* initial configurations*, where a* configuration *is a pair of the form* (*q*,m) *where q* <sup>∈</sup> *Q and* <sup>m</sup> : *<sup>X</sup>* <sup>→</sup> *<sup>D</sup>*<sup>I</sup> *is a register assignment called a* memory*; and F* : *<sup>Q</sup>* <sup>→</sup> QFFΓ,*<sup>X</sup> is a* final condition assignment*.*

<sup>1</sup> We do not require finiteness in the basic definition in order to avoid artificial restrictions of the notions of automata with registers/counters/counting sets defined later.

*The language of A, L*(*A*)*, is defined as the language of its* configuration automaton Conf(*A*)*. States of* Conf(*A*) *are* configurations *of A that are reachable. I is the set of initial states of* Conf(*A*)*. It has a transition* (*q*,m)−{*a*→} (*q* ,m ) *iff* (*q*,m) *is reachable and A has a transition* δ = *q*−{*a*,ϕ,*u*→} *q* ∈ Δ *such that* (*q* ,m ) *is the* image *of* (*q*,m) *under* δ*, denoted* (*q* ,m ) = δ(*q*,m)*, meaning that (1)* δ *is* enabled *in* (*q*,m)*,* m |= ϕ*, and (2)* m = *u*(m)*, i.e.* m (*x*) = *u*(*x*)<sup>I</sup> (m) *for each x* ∈ *X. We let* δ(*C*) = {δ(*c*) | *c* ∈ *C*} *for a set of configurations C. A configuration* (*q*,m) *is a final if* m |= *F*(*q*)*. By* runs of *A we mean runs of* Conf(*A*)*. The RA A is* deterministic *if* Conf(*A*) *is deterministic. The size of the RA is* |*A*| = |*Q*|+∑δ∈<sup>Δ</sup> |δ| *where* |δ| *is the sum of the sizes of the update and the guard.*

Definition 2 (Counting automata). *A* counting automaton *(CA) is an automaton with registers, called* counters*, operated through the* counting language Γcnt *that contains the unary increment function, denoted x*+1*, constants* 0 *and* 1*, and predicates x* > *k and <sup>x</sup>* <sup>≤</sup> *k, k* <sup>∈</sup> <sup>N</sup>*, with the standard interpretation over natural numbers, that we denote* <sup>I</sup>cnt*.*

Regexes with counting may be translated to CA by several methods ([36,33,14,23]). We use a slightly adapted version of [14]—an extension of Glushkov's algorithm [16] to counting. For a regex *R*, it produces a CA CA(*R*) = (*X*,*Q*,Δ,{α0},*F*). Figure 1 shows an example of such CA. The construction is discussed in detail in [21], here we only overview the important properties needed in Sections 4-6:

Fig. 1: CA(*R*) for *R* = ((a|b)b){3,8}. The accepting condition of all states is ⊥ except for *b*<sup>2</sup> whose accepting condition is written in the square brackets.


A DFA can be obtained by the subset construction in the form DA(Conf(CA(*R*))), called *explicit determinization*. Due to the factor max*<sup>R</sup>* in the size of Conf(CA(*R*)), the explicit determinization is exponential to max*<sup>R</sup>* even if *R* is flat, meaning doubly exponential to |*R*| (*R* has max*<sup>R</sup>* written as a decadic numeral). If *R* is not flat, then the factor max*<sup>R</sup>* is replaced by (max*R*)*cntR* .

## 4 Counter-subset Construction

In this section, we formulate a modified version of determinization of CA from [36] that constructs a machine of a size independent of max*R*. Our version handles the entire class of Cartesian CA (defined below) and in turn also all regexes with flat counting.

The main idea of the determinization remains the same as in [36]. The standard subset construction is augmented with registers, we call them *counting sets*, that can store sets of counter values that would be generated by non-deterministic runs of the CA. The automata with counting-sets as registers are called *counting-set automata*. Our first modification of [36] is indexing of counters by states. In intuitively, this allows to handle cases such as a\*(ba|ab){5}, where, after reading the first *ab*, the counter is either incremented or not (*b* is the first letter of the counted sub-expression or not). This would violate the uniformity property of CA necessary in [36]—the set of values generated by the non-deterministic CA runs must be the same for every CA state. In our modified version, values at distinct states are stored separately in registers indexed by those states and may differ. Then, in order to handle the indexed counters, we have to introduce a general assignment of counters, allowing to assign the *union* of other counters.2 Intuitively, when a run non-deterministically branches into several states, each branch needs to continue with its *own copy* of the set, stored in a counter indexed by the state. The union of sets is used when the branches join again. This brings a technical challenge that we solve in this work: how to simulate the counting-set automata fast when the set union and copy are used? The solution is presented in Sections 5 and 6.

Definition 3 (Counting-set automata). *A* counting-set automaton (CSA) *is an automaton with registers operated through the* counting-set language Γset *under the* number-set interpretation I{} cnt *where the language* Γset *extends the counting language* Γcnt *with the constant* 0/*, binary union* ∪*, and set-filter functions* ∇*<sup>p</sup> where p is a predicate symbol of* Γcnt*. For simplicity, we restrict terms assigned to counters by transition updates to the form t* = *t*<sup>1</sup> ∪···∪*tn where each ti is either (a) a term of* Γcnt *or* 0/*, (b) of the form* ∇*p*(*t*) *where t is a term of* Γcnt*. Each ti is called an r*-term *of t.*

*The domain of* Iset *is* sets of natural numbers*, P*(N)*. The interpretation of the predicates and functions of* Γcnt *under* Iset *is derived from the base number interpretation of the same predicates and functions: A function returns the image of the set in the argument under the base semantics, f* <sup>I</sup>set (*S*) = { *<sup>f</sup>* <sup>I</sup>cnt (*n*) <sup>|</sup> *<sup>n</sup>* <sup>∈</sup> *<sup>S</sup>*}*. A set satisfies a predicate if some of its elements satisfy the base semantics of that predicate, <sup>p</sup>*Iset (*S*) ⇐⇒ ∃*<sup>e</sup>* <sup>∈</sup> *<sup>S</sup>* : *<sup>p</sup>*Icnt (*e*)*. Filters then filter out values that do not satisfy the base semantics of their predicate,* ∇Iset *<sup>p</sup>* (*S*) = {*<sup>e</sup>* <sup>∈</sup> *<sup>S</sup>* <sup>|</sup> *<sup>p</sup>*Icnt (*e*)}*. Finally,* 0/ *is interpreted as*

<sup>2</sup> [36] could assign to a counter *x* only a constant or function of the current value of *x*.

### *the empty set and* ∪ *as the union of sets. We denote memories of the CSA by* s *to distinguish them from memories of CA. We write DCSA to abbreviate deterministic CSA.*

Less formally, registers of CSA hold sets of numbers and are manipulated by the increment *x* +1 of all values, assignment of constant sets {0}, {1}, and 0/, denoted by 0, 1, and 0/, filtering out values smaller or larger than a constant, denoted ∇*x*≤*k*(*x*) and <sup>∇</sup>*x*<*k*(*x*), and testing on a presence of a value *<sup>x</sup>* satisfying *<sup>x</sup>* <sup>≤</sup> *<sup>k</sup>* or *<sup>x</sup>* <sup>&</sup>lt; *<sup>k</sup>*, *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>.

We will present an algorithm that determinizes a CA *<sup>A</sup>* = (*X*,*Q*,Δ,*I*,*F*), fixed for the rest of the section, into a DCSA DCSA(*A*)=(*X*{},*Q*{},Δ{},*I*{},*F*{}). We assume that guards of transitions in Δ and final conditions are of the form - *<sup>x</sup>*∈*<sup>Y</sup> px*[*x*],*<sup>Y</sup>* <sup>⊆</sup> *<sup>X</sup>*, i.e. conjunctions with a at most a single atomic predicate per counter. This is satisfied by all CA(*R*), for any regex *R* (see the list of properties of CA(*R*) in Section 3).<sup>3</sup>

Runs of DCSA(*A*) will *encode* runs of DA(Conf(*A*)) obtained from the explicit determinization of *A*. Recall that the states DA(Conf(*A*)) are sets of configurations of *A*, pairs (*q*,m) of a state and a counter assignment. DCSA(*A*) will represent the sets of counter values within a DA state as run-time values of its registers.

Particularly, for every state *q* and a counter *x* of the CA, DCSA(*A*) has a register *xq* in which it remembers, after reading a word *w*, the set of all values that *x* reaches in runs of the base CA on *w* ending in *q*. Hence, we have *X*{} = {*xq* | *x* ∈ *X* ∧*q* ∈ *Q*}

Definition 4 (Encoding of sets of CA configurations). *A state S* <sup>=</sup> {(*qi*,m*i*)}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *of* DA(Conf(*A*)) *is encoded as the* DCSA(*A*) *configuration enc*(*S*)=({*qi*}*<sup>n</sup> <sup>i</sup>*=1, <sup>s</sup>) *where* <sup>s</sup>(*xq*) = {m*i*(*x*) <sup>|</sup> *qi* <sup>=</sup> *<sup>q</sup>*}*<sup>n</sup> <sup>i</sup>*=1*.*

Since a set of assignments appearing with the state *q* is broken down to sets of values of the individual counters, it disregards relations between values of different counters. For instance, in the DA state *<sup>S</sup>*<sup>1</sup> <sup>=</sup> {(*q*,{*<sup>x</sup>* → <sup>0</sup>, *<sup>y</sup>* → <sup>0</sup>}),(*q*,{*<sup>x</sup>* → <sup>1</sup>, *<sup>y</sup>* → <sup>1</sup>})}, the values of *<sup>x</sup>* and *<sup>y</sup>* are either both 0 or both 1, but *enc*(*S*1)=(*q*,{*xq* → {0,1}, *yq* → {0,1}}) does not retain this information. It is identical to the encoding of another DA state *<sup>S</sup>*<sup>2</sup> <sup>=</sup> {(*q*,{*<sup>x</sup>* → <sup>1</sup>, *<sup>y</sup>* → <sup>0</sup>}),(*q*,{*<sup>x</sup>* → <sup>0</sup>,*<sup>y</sup>* → <sup>1</sup>})}. This is the same loss of information as in the so-called Cartesian abstraction. The encoding is hence precise and unambiguous only when we assume that inside the states of DA(*A*), the relations between counters are always unrestricted—there is no information to be lost. We then call the CA *Cartesian*, as defined below. The encoding function is then unambiguous, and we call the inverse function *decoding*, denoted *dec*.

Definition 5 (Cartesian CA). *Assuming the set of counters of A is X* <sup>=</sup> {*xi*}*<sup>m</sup> <sup>i</sup>*=1*, then a set C of configurations of A is* Cartesian *iff, for every state q of A, there exist sets <sup>N</sup>*1,...,*Nm* <sup>⊆</sup> <sup>N</sup> *such that* (*q*,{*xi* → *ni*}*<sup>m</sup> <sup>i</sup>*=1) <sup>∈</sup>*C iff* (*n*1,...,*nm*) <sup>∈</sup> *<sup>N</sup>*<sup>1</sup> ×···×*Nm. The CA A is* Cartesian *iff all states of* DA(Conf(*A*)) *are Cartesian.*

For instance, the DA states *S*<sup>1</sup> and *S*<sup>2</sup> above are not Cartesian, while *S*<sup>1</sup> ∪*S*<sup>2</sup> is.

Similarly as the regex to CA construction of [36], our regex to CA construction discussed in Section 3 returns a Cartesian CA when called on a flat regex.

<sup>3</sup> Every CA can be transformed to this form by transforming the formulae to DNF and creating clones of transitions/states for individual clauses.

*Subset construction for Cartesian CA.* The algorithm below is a generalization of the subset construction. Let us denote by index*q*(*t*) the term that arises from *t* by replacing every variable *x* ∈ *X* by *xq*, analogously index*q*(ϕ) for formulas. We have *Q*{} ⊆ *P*(*Q*), the initial configuration *I*{} = {*enc*(*I*)}, and the final conditions assign to *R* ∈ *Q*{} the disjunction of the final conditions of its elements, *F*{}(*R*) = - *<sup>q</sup>*∈*<sup>R</sup>* index*q*(*F*(*q*)).

We will construct DCSA(*A*) which is deterministic and its runs encode the runs of DA DA(Conf(*A*)). Conf(DCSA(*A*)) will be isomorphic to DA(Conf(*A*)). For that, we need for each transition δ of DA(Conf(*A*)) one unique transition of DCSA(*A*) over the same letter enabled in the encoding of the source of δ and generating the encoding of the target of δ. In other words, we need for each transition *dec*(*R*, s)−{*a*→} *dec*(*R* , s ) of DA(Conf(*A*)) one unique transition δ = *R*−{*a*,ϕ,*u*→} *R* ∈ Δ{} with (*R* , s ) = δ (*R*, s). That transition δ will be built by summarizing the effect of all base CA *a*-transitions enabled in the CA configurations of *dec*(*R*, s).

To construct the transition δ , we first translate each base transition δ = *q*−{*a*,ϕδ,*u*<sup>δ</sup> →} *r* ∈ Δ into its set-version δ{}, supposed to transform an encoding of a (Cartesian) set *C* of configurations, *enc*(*C*), into the encoding of the set of their images under δ, *enc*(δ(*C*)), and enabled if δ is enabled for at least one configuration in *C*. To that end, assuming ϕδ = *<sup>x</sup>*∈*<sup>X</sup> px*[*x*], we (1) construct the update *<sup>u</sup>*<sup>∇</sup> <sup>δ</sup> from *u*<sup>δ</sup> by substituting in every *u*δ(*x*),*x* ∈ *X* variables *y* ∈ *X* by their filtered versions ∇*py* (*y*), (2) add indices to registers that mark the current state, resulting in the transition δ{} = *q*−{*a*,ϕ{} <sup>δ</sup> ,*u*{} δ →} *r* where ϕ{} <sup>δ</sup> = index*q*(ϕδ) and *u*{} <sup>δ</sup> assigns to every *xr*, *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>* the term index*q*(*u*<sup>∇</sup> <sup>δ</sup> (*x*)).

The states *Q*{} and the transitions Δ{} are then constructed as the smallest sets satisfying that *enc*(*I*) ∈ *Q*{} and every *R* ∈ *Q*{} has for every *a* ∈ Σ the outgoing transitions constructed as follows. Let {*q*−*<sup>j</sup>* {*a*,ϕ*j*,*u*→*j*} *rj*}*j*∈*<sup>J</sup>* for some index set *J* be the set of *constituent a-transitions* for *R*, all *a*-transitions δ{} where δ ∈ Δ originates in *R*. To achieve determinism, Δ{} has the transition *R*−{*a*,ψ,*u*→} *R* for every minterm ψ ∈ *Minterms*({ϕ*j*}*j*∈*<sup>J</sup>* ). The update *u* and target *R* are constructed from the set {*q*−*<sup>j</sup>* {*a*,ϕ*j*,*u*→*j*} *rj*}*j*∈*K*, *K* ⊆ *J*, of constituent transitions with guards ϕ*<sup>j</sup>* compatible with the minterm ψ, i.e., with satisfiable ψ∧ϕ*j*. *R* is the set of their target states, *R* = {*rj*}*j*∈*K*, and *u*(*x*) unites all their update terms *uj*(*x*), i.e. *u*(*x*) = *<sup>j</sup>*∈*<sup>K</sup> uj*(*x*), for each *x* ∈ *X*{}.

*Example 1.* When showing examples of transition updates, we write *x* :=*t* to denote that *u*(*x*) = *t* and we omit the assignments *x* :=0/ in CSA.

Let *R* = {*p*,*q*} and let the *a*-transitions originating at *R* be *q*−{*a*,,*x*:=*x*→} *s*, *p*−{*a*,*x*<*n*,*x*:=*x*+1→} *r*, and *p*−{*a*,*x*≥*m*,*x*:=1→} *s*. They induce three constituent transitions for *R* and *a*, *q*−{*a*,,*xs*:=*xq*→} *s*, *p*−{*a*,*xp*<*n*,*xr*:=∇*x*<*n*(*xp*)+1→} *r*, and *p*−{*a*,*xp*≥*m*,*xs*:=1→} *s*. A transition *R*−{*a*,ψ,*u* →} *R* is constructed for each of the following minterms ψ: *xp*<*n*∧*xp*≥*m*, ¬*xp*<*n*∧ *xp*≥*m*, *xp*<*n*∧ ¬*xp*≥*m*, ¬*xp*<*n*∧ ¬*xp*≥*m*. For the first one, all three constituent transitions are compatible and so the update *u* is *xr* :=∇*x*<*n*(*xp*)+1;*xs* :=*xq* ∪1 (update of *xr* is taken from the first constituent transitions leading to *r*, update of *xs* is the union of the updates of the second two transitions leading to *s*) and the target state is *R* = {*r*,*s*}. 

DCSA(*A*) is deterministic since it has a single initial configuration and the guards of transitions originating in the same state are minterms. The size of DCSA(*A*) obviously depends only on the size of *A* and not on the interpretation of the language. Especially, when *A* is CA(*R*) for some regex *R*, the size does not depend on max*R*. The theorem below is proved in [21].4

Theorem 1. DCSA(*A*) *is deterministic,* <sup>|</sup>DCSA(*A*)| ∈ *<sup>O</sup>*(2|*A*<sup>|</sup> )*, and if A is Cartesian, then L*(*A*) = *L*(DCSA(*A*))*.*

Since for regexes with flat counting, our regex to CA algorithm always returns a Cartesian CA, we can transform them into DCSA.

## 5 Fast Simulation of Counting-set Automata

In this section, we discuss how a run of a DCSA on a given word can be *simulated* efficiently to achieve fast matching. Let us fix a word *w* = *a*<sup>1</sup> ···*an* together with the DCSA *<sup>A</sup>* = (*X*,*Q*,Δ,{α0},*F*). We wish to construct the run of the DCSA on *<sup>w</sup>* and test whether the reached configuration is accepting. We aim at a running time linear to |*w*| and independent of the sizes of the sets stored in *A*'s registers at run-time.

We will assume that the initial configuration α<sup>0</sup> of *A* assigns to every register a singleton or the empty set. The assumption is satisfied by CSA constructed from CA(*R*), *R* being any regex, by the algorithms of Section 4 and also Section 6. 5

Technically, the simulation maintains a configuration α = (*q*, s), initialized with α0, and for every *i* from 1 to *n*, it constructs the transition α−{*a*→*i*} α of Conf(*A*) and replaces α by the successor configuration α = (*q* , s ). We use the key ingredient of fast simulation from [36], the *offset-list data structure* for sets of numbers with constant time addition of 0/1, comparison of the maximum to a constant, reset, and increment of all values. The problem is that the newly added union and copy of sets are still linear to the size of the sets, and hence linear to the maximum counter bounds. We show how, under a condition introduced below, set copy can be avoided entirely and the cost of union can be amortized by the cost of incrementing the sets. This will again allow a CSA-simulation in time independent of max*<sup>A</sup>* and falling into *O*(|*A*|·|*w*|).

First, we define a property of CSA sufficient for fast simulation—that the updates on its transitions do not *replicate counters*.

Definition 6 (Counter replication). *We say that a CSA* replicates counters *if for some transition q*−{*a*,ϕ,*u*→} *r, some counter appears in the image of u twice, that is, it appears in two r-terms of some u*(*x*) *or it appears in u*(*x*) *as well as in u*(*y*) *for x* = *y. A* nonreplicating *CSA does not replicate counters.*

For instance, {*<sup>x</sup>* → *<sup>x</sup>*; *<sup>y</sup>* → *<sup>x</sup>* <sup>+</sup> <sup>1</sup>} and {*<sup>x</sup>* → *<sup>x</sup>* <sup>∪</sup> *<sup>x</sup>* <sup>+</sup> <sup>1</sup>, *<sup>y</sup>* → *<sup>y</sup>*} are updates where *<sup>x</sup>* is replicated, {*<sup>x</sup>* → *<sup>x</sup>*+1,*<sup>y</sup>* → *<sup>y</sup>*} is not a replicating update.

<sup>4</sup> It may be interesting to note that, as follows from our formulation of the determinization, the construction is independent of the particular f.o.l. used to manipulate registers and of its interpretation. The determinization could be applied to any kind of automata that fits the definition of automata with registers. The numbers could be manipulated by other functions and tests, natural numbers could be replaced by reals etc. The counting-set automata are themselves an instance of automata with registers. One could also think about push-down automata or, with small modifications, variants of data-word automata with registers.

<sup>5</sup> This is a technical assumption important in order for unions of the initial sets not to influence the overall complexity of the simulation.

*Offset-list data structure.* The *offset-list* data structure of [36] allows constant time implementation of the set operations of increment of all elements, reset to 0/ or {0} or {1}, addition of 0 or 1, and comparison of the maximum with a constant.

It assigns to every counter *x* ∈ *X* a pointer *ol*(*x*) to an *offset-list pair* (*ox*,*lx*) with the *offset ox* <sup>∈</sup> <sup>N</sup> and a sorted list *lx* <sup>=</sup> *<sup>m</sup>*1,...,*mk* of integers. The data structure implementing the list needs constant access to the first and the last element, forward and backward iteration of a pointer, and insertion/deletion at/before a pointer to an element. This is satisfied for instance by a doubly-linked list that maintains pointers to the first and the last element. The offset-list pair represents the set s(*x*) = {*m*<sup>1</sup> +*ox*,...,*mk* +*ox*}. Union of two such sets is still linear in their size, but we will show that if the CSA does not replicate counters, the cost of set unions can be amortized by the cost of increments.

*Finding the CSA transition and evaluating the update.* The first step of computing α from α is finding the transition *q*−{*ai*,ϕ,*u*→} *q* ∈ Δ, the only *ai*-transition from *q* that is enabled, i.e. where s |= ϕ. The simplest algorithm iterates through the transitions of Δ and, for each of them, tests whether s satisfies its guard. The cost of evaluating an atomic counter predicate *p*, i.e., deciding whether s |= *p*, is constant: since the lists *lx* are sorted, we only need to access the first or the last element and the offset to decide *x* < *n* or *x* ≥ *n*, respectively. With that, the cost of evaluating ϕ is linear to the size of ϕ. The cost of the iteration through the transitions of Δ is then linear in the sum of their sizes, which is within *O*(|*A*|).

Having found *q*−{*ai*,ϕ,*u*→} *q* , we evaluate its update to compute s and compute α as (*q* , s ). We will explain the algorithm and argue that the amortized cost of computing s is in *O*(|*X*|). The update is evaluated by, for each *x* ∈ *X*, evaluating all r-terms in *u*(*x*), uniting the results, and assigning the union to *ol*(*x*).

First, we argue that evaluating an r-term *t* of *u*(*x*), i.e. computing *t*(s), is amortized constant time. Since the counters are non-replicating, we can compute the value of each r-term *t*[*y*] in situ. That is, we modify the offset-list pair (*oy*,*ly*) and return the pointer *ol*(*y*). The original value of *y* can be discarded after evaluating *t*[*y*] since *y* does not appear in any other r-term. There are 5 cases: (1) If *t* is 0 or 1, then we return a pointer to a fresh offset-list pair with the offset 0 and the list containing only 0 or 1, respectively. This is done in constant time.

(2) If *t* is *y* ∈ *Y*, then we return *ol*(*y*).

(3) If *t* is *y*+1, then *oy* is incremented by one. This constant time implementation of the increment is the reason for pairing the lists with the offsets.

(4) If *t* is ∇*p*[*y*], then *ly* is filtered by the atomic predicate *p*. Filtering with the predicate *x* ≥ *n* uses the invariant of sortedness of *ly*. It is done by iterating the following steps: i) test whether the list head is smaller than *n*−*oy* and ii) if yes, remove the head, if not, terminate the iteration. Every iteration is constant time: The cost of the iterations which remove an element is amortized by the cost of additions of the element to the list. What remains is only the constant cost of the last iteration which detects an element greater or equal to *n*−*oy*, or that the list is empty. Filtering with *x* < *n* is analogous (the iterations test and remove the last element instead of the head).

(5) If *t* is ∇*p*(*y*)+1, then the construction for the constant increment is applied after the constant filter discussed above.

Next, we argue that computing the union of values of the r-terms in *u*(*x*) may be amortized by the cost of evaluating the increment terms. Let *l*1,...,*ln* be the offset-list representations of the values of the terms in *u*(*x*) computed by the algorithm above. The offset-list representation of their union is computed by a sequence of merging, as *merge*(*l*1,*merge*(*l*2,...*merge*(*ln*−1,*ln*)...)). Particularly, given two pointers to offsetlists *l*,*l* - , *merge*(*l*,*l* - ) implements their union: it chooses the offset-list that represents a set with the larger maximum, assume that it is *l*, and inserts the elements represented by the other list, *l* - , to it. We say that *l is merged into l*. This is done by the standard sortedlist merging in time *O*(|*l* - |) where |*l* - | is the length of *l* - . Since *l* is without duplicities and with minimum 0, *O*(|*l* - |) ⊆ *O*(max(*l* - )) where max(*l* - ) is the maximal element.

The *O*(max(*l* - )) cost is amortized by the cost of evaluating increments. The offsetlist pair at *l* has seen at least max(*l* - ) − 1 increments since the only elements inserted into it are 0, 1, or, during merge, elements from other sets smaller than max(*l* - ). These increments of *l* are the budget used to pay for the mergeing of *l* into *l*. After the merge, the offset-list pair of *l* is discarded (as the CSA is non-replicating, it is no longer needed) hence the budget is used only once. Last, the assignment of the union to *c* is done by a constant time assignment of a pointer to the offset-list returned by the merge.

*Overall complexity of the simulation.* Let us define the cost *cost*(*x*) of manipulations with the counter *x* ∈ *X* during one step of the simulation as the sum of the costs of: (1) evaluating all r-terms containing *c*, (2) merging their offset-list into other ones, (3) creating offset-lists for terms 0 or 1 in *u*(*x*) and merging them into other offset-lists, (4) the assignment of the result of *u*(*x*) to *x*. The cost of processing a single letter *ai* is then the sum ∑*x*∈*<sup>X</sup> cost*(*x*) and |*w*|·∑*x*∈*<sup>X</sup> cost*(*x*) is the cost of the entire simulation. Since the CSA is non-replicating and evaluating a single r-term is amortized constant time, the cost of (1) is in amortized constant time. The cost of (2) is amortized by increments from step (1). The creation and insertion of singletons in (3), at most two in *u*(*x*), is constant time. The pointer assignment in (4) is constant time. The *cost*(*x*) is therefore amortized constant time, the amortized time of evaluating the update *u* is in *O*(|*X*|), and the cost of the updates through the simulation is in *O*(|*X*|·|*w*|). The cost of choosing the transitions, by evaluating their guards, is in *O*(|*A*|·|*w*|) by the above analysis. Analogously, the cost of testing the accepting condition at the reached configuration is in *O*(|*A*|).

Theorem 2. *If A is non-replicating, then its simulation on w takes O*(|*A*|·|*w*|) *time.*

## 6 Augmented Determinization

In this section, we augment the subset construction from Section 4 with optimizations that prevent counter replication and hence extend the class of regexes that can be matched fast by simulation of the CSA. It optimizations are tailored to CA with the special properties of CA(*R*), for a regex *R*, listed in Section 3.

*Intuition for the optimizations.* The emergence of counter replication and means of its elimination in the augmented construction, by techniques of *counter sharing* and *increment postponing*, are illustrated on simplified fragments of CA in Figure 2.

*r q s* a) *<sup>a</sup>*;*<sup>x</sup>* :=*x*+<sup>1</sup> *<sup>a</sup>*;*<sup>x</sup>* :=*x*+<sup>1</sup> *b*;*x* :=*x b*;*x* :=*x q r* b) *<sup>a</sup>*;*<sup>x</sup>* :=<sup>1</sup> *a*; *<sup>x</sup>* :=*<sup>x</sup> <sup>a</sup>*; *<sup>x</sup>* :=*x*+<sup>1</sup> *<sup>a</sup>*;*<sup>x</sup>* :=*x*+<sup>1</sup> *r q s* c) *<sup>a</sup>*;*<sup>x</sup>* :=*x*+<sup>1</sup> *<sup>a</sup>*;*<sup>x</sup>* :=*<sup>x</sup> b*;*x* :=*x b*;*x* :=*x*+1

Fig. 2: Sub-structures of CA that are sources of counter replication.

In a), DCSA(CA(*R*)) has transitions {*q*}−{*a*,*xr*:=*xq*+1,*xs*:=*xq*+1→} {*r*,*s*}−{*b*,*xq*:=*xr*∪*xs*→} {*q*}. The first transition replicates the entire content of the *xq*, the second one unites the two sets. Both transitions are expensive. The can be optimized by detecting that the values of *xs* and *xr* are the same, being generated by *syntactically identical* updates, and storing the values in a *shared counter x*{*s*,*r*}. This would result in transitions {*q*}−{*a*,*x*{*r*,*s*}:=*x*{*q*}+1→} {*s*,*t*}−{*b*,*x*{*q*}:=*x*{*r*,*s*} →} {*q*}, with the replication and union eliminated.

Figure b) then illustrates why a counter *xP*, *P* ⊆ *Q*, represents the set of values shared between the original counters *xp*, *p* ∈ *P*. That is, *xP* does not always hold the entire sets stored in the counters *xp*, *p* ∈ *P*. If their values are not the same, it stores only their intersection. The value of each *xp* is then partitioned among several shared counters *xS* with *p* ∈ *S*. In b), DCSA(CA(*R*)) has transitions *q*−{*a*,*xq*:=*xq*;*xr*:=1→} {*q*,*r*}−{*a*,*xq*:=*xq*∪*xr*+1;*xr*:=1∪*xr*+1→} {*q*,*r*}, replicating the counter *xr*. Counter sharing would then generate transitions *q*−{*a*,*x*{*q*}:=*x*{*q*};*x*{*r*}:=1→} {*q*,*r*}−{*a*,*x*{*q*}:=*x*{*q*};*x*{*r*}:=1;*x*{*q*,*r*}:=*x*{*r*}+1→} {*q*,*r*} with counters *x*{*q*}, *x*{*r*} for the subsets exclusive to *xq* and *xr*, respectively, and *x*{*q*,*r*} for the intersection.

Last, in c), we illustrate the technique of *increment postponing*. DCSA(CA(*R*)) would have transitions {*q*}−{*a*,*xr*:=*xq*+1,*xs*:=*xq*→} {*s*,*t*}−{*b*,*xq*:=*xr*∪*xs*+1→} {*q*}. Since the increments on the two branches happen in different moments, the values of *xr* and *xs* differ until the last increment of *xs* synchronizes them. We avoid replication by storing the nonincremented value, obtained from *xq*, in a counter shared by *xr* and *xs* and remembering that an increment of *xr* has been postponed. This is marked with + in the name of the shared counter *x*{*r*+,*s*}. When the values of *xr* and *xs* synchronize (the increment is applied to *xs* too), the postponed increment is evaluated and the +-mark is removed. We would create transitions {*q*}−{*a*,*x*{*r*+,*s*}:=*x*{*q*} →} {*s*,*t*}−{*b*,*x*{*q*}:=*x*{*r*+,*s*}+1→} {*q*}. If, before the synchronization, the value of the marked counter is either tested or incremented for the second time, we declare an *irresolvable replication* and abort the entire construction (we allow postponing of only one increment). To prevent this situation from arising needlessly, we let states remember the counters that must have the empty value and we ignore these counters.

*Augmented Determinization Algorithm.* The augmented determinization produces from CA(*R*)=(*X*,*Q*,Δ,{α0},*F*) the CSA DCSA<sup>a</sup>(CA(*R*)) = (*X*<sup>a</sup>,*Q*<sup>a</sup>,Δ<sup>a</sup>,{α<sup>a</sup> 0},*F*<sup>a</sup>). Its counters in *<sup>X</sup>*<sup>a</sup> are of the form *xS* where *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>* and *<sup>S</sup>* <sup>⊆</sup> *<sup>Q</sup>*<sup>+</sup> and *<sup>Q</sup>*<sup>+</sup> <sup>=</sup> *<sup>Q</sup>* ∪ {*q*<sup>+</sup> <sup>|</sup> *<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>*}. The guiding principle of the algorithm is that an assignment s<sup>a</sup> of *X*<sup>a</sup> represents an assignment s of the counters in *X*{} of DCSA(CA(*R*)), namely, for each *xq* ∈ *X*{},

$$\mathfrak{s}(\mathbf{x}\_{q}) = \bigcup\_{q \in S, S \subseteq \underline{Q}^{+}} \mathfrak{s}^{\mathbf{a}}(\mathbf{x}\_{\mathcal{S}}) \cup \bigcup\_{q^{+} \in S, S \subseteq \underline{Q}^{+}} \{n+1 \mid n \in \mathfrak{s}^{\mathbf{a}}(\mathbf{x}\_{\mathcal{S}})\}\,. \tag{1}$$

We will use some simplifying notation. As discussed in Section 3, by the construction of CA(*R*), the increment of *c* and the guard *x* < max*<sup>x</sup>* always appear on its transitions together, without any other guard on *x*. Hence, in DCSA(CA(*R*)), all terms with an increment or filtering are of the form ∇*x*<max*<sup>x</sup>* (*xq*◦ )+1. We will denote them by the shorthand *xq*◦ <sup>⊕</sup>1 (we are using *<sup>q</sup>*◦ to denote an element from the set *<sup>Q</sup>*+, either *<sup>q</sup>* or *<sup>q</sup>*+, for *<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>*).

The states of DCSA<sup>a</sup>(CA(*R*)) will additionally be distinguished according to which of the counters of *X*<sup>a</sup> are *active*, i.e., could have a non-empty value. Counters always valued by 0/ can be ignored, which simplifies transitions and decreases the chance of an irresolvable counter replication. The states of DCSA<sup>a</sup>(CA(*R*)) are thus of the form (*R*,*Act*) where *<sup>R</sup>* <sup>⊆</sup> *<sup>Q</sup>* and *Act* <sup>⊆</sup> *<sup>X</sup>*<sup>a</sup> is a set of active counters.

The initial configuration is α<sup>a</sup> <sup>0</sup> = (({*q*0},{*x*{*q*0} <sup>|</sup> *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>*}), <sup>s</sup><sup>a</sup> <sup>0</sup>) where s<sup>a</sup> <sup>0</sup> assigns {0} to every *<sup>x</sup>*{*q*0},*<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>* and 0/ to every other counter in *<sup>X</sup>*<sup>a</sup>. The final condition assignment *<sup>F</sup>*<sup>a</sup>((*R*,*Act*)) is, for each (*R*,*Act*) <sup>∈</sup> *<sup>Q</sup>*<sup>a</sup>, constructed from *<sup>F</sup>*{}(*R*) by replacing every predicate *p*[*xq*] by the disjunction *p*[*xq*] *Act* = - *xS*∈*Act*,*q*∈*<sup>S</sup> p*[*xS*] that encodes *p*[*xq*] using the counters of *Act* in the sense of (1).

The transitions in <sup>Δ</sup><sup>a</sup> are constructed from transitions in <sup>Δ</sup>{}. For source state (*R*,*Act*) <sup>∈</sup> *<sup>Q</sup>*<sup>a</sup>, an original transition *<sup>R</sup>*−{*a*,ϕ,*u*→} *<sup>R</sup>* <sup>∈</sup> <sup>Δ</sup>{}, and set of active counters *Act* <sup>⊆</sup> *<sup>X</sup>*<sup>a</sup>, <sup>Δ</sup><sup>a</sup> has the transition (*R*,*Act*)−{*a*,ϕa,*u*<sup>a</sup> →} (*R* ,*Act* ), constructed as follows:

The guard ϕ<sup>a</sup> is made from ϕ by replacing every predicate *p*[*xq*] by the equivalent version with shared counters *p*[*xq*] *Act* (as when constructing *F*<sup>a</sup> above).

The update *u*<sup>a</sup> is constructed in three steps. First, the update *u*sh is made from *u* by expressing the r-terms of *u* using the shared counters *X*<sup>a</sup>. Each *t*[*xq*] is replaced by

$$\mathfrak{a}^{\mathbf{a}} = \bigcup \left( \left\{ t[\mathbf{x}\_{\mathcal{S}}] \mid \mathbf{x}\_{\mathcal{S}} \in \operatorname{Act}, q \in \mathcal{S} \right\} \cup \left\{ t[\mathbf{x}\_{\mathcal{S}}] \oplus \mathbf{1} \mid \mathbf{x}\_{\mathcal{S}} \in \operatorname{Act}, q^{+} \in \mathcal{S} \right\} \right) \dots$$

Notice that all postponed increments are *evaluated* in *u*sh, transformed to normal increments. If *<sup>u</sup>*sh has an r-term *<sup>t</sup>*⊕1⊕1, i.e., a double increment, then the whole construction aborts and declares an *irresolvable counter replication*. We allow postponing only one increment.<sup>6</sup> Otherwise, we proceed to resolve counter replication. First, we make sure that every counter appears in the image of the update only in one kind of r-term. We collect the set *Conflict* of all r-terms *xS* <sup>⊕</sup>1 of *<sup>u</sup>*sh with *conflicting increments*, i.e. such that also *xS* is an r-term of *u*sh. In update *u*+, conflicting increments are *postponed*. For *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>*, *<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>*, and *<sup>u</sup>*sh(*xq*) = *T*,

$$
\mu^+(\mathbf{x}\_q) = \bigcup \left( T \nmid \operatorname{Conflect} t \right) \text{ and } \mu^+(\mathbf{x}\_{q^+}) = \bigcup \left\{ \mathbf{x}\_{\mathcal{S}} \mid \mathbf{x}\_{\mathcal{S}} \oplus 1 \in T \cap \operatorname{Conflect} t \right\} \dots
$$

The final update *u*<sup>a</sup> then resolves counter replication, by grouping r-terms replicated in *u*<sup>+</sup> under a common l-value (we call *z* an *l-value* of r-terms of *u*+(*z*)). For an r-term *t* of *u*+, let lval(*t*) be the set of its l-values. Note that lval(*t*) is always of the form {*xq*◦ }*x*∈*<sup>S</sup>* for some fixed *x* ∈ *X* (see property 4 of CA(*R*) in Section 3). We let *Act* be the set of counters *xS* with lval(*t*) = {*xq*◦ }*x*∈*<sup>S</sup>* for some r-term of *<sup>u</sup>*+. For all *xS* <sup>∈</sup> *<sup>X</sup>*<sup>a</sup>, if *xS* <sup>∈</sup> *Act* then *<sup>u</sup>*<sup>a</sup>(*xS*) = 0/ else

$$\underline{\mathbf{u}^{\mathbf{a}}(\mathbf{x}\_{\mathcal{S}})} = \bigcup \{ t \mid t \text{ is an r-term of } u^{+} \text{ and } \mathbf{1} \text{va} \mathbf{1}(t) = \{ \mathbf{x}\_{q^{\diamond}} \}\_{q^{\diamond} \in \mathcal{S}} \}\\_{\mathbf{x}}$$

<sup>6</sup> Also transition guards and final conditions of DCSAa(CA(*R*)) must not contain the +-mark since evaluating them regardless the postponed increments would return incorrect results. However, declaring counter replication on seeing a double increment here covers these cases due to the structural properties of CA(*R*).

*Example 2.* Let us have *R*−{*a*,ϕ,*u*→} *R* ∈ Δ{} created in Example 1 with *R* = {*p*,*q*}, *R* = {*r*,*s*}, ϕ = *xp*<*n*∧*xp*≥*m*, and *u* = {*xr* :=*xp* ⊕1,*xs* :=*xq*∪1}. Let *Act* = {*x*{*p*,*q*},*x*{*p*,*q*<sup>+</sup>}}. Then *<sup>u</sup>*sh <sup>=</sup> {*xr* :<sup>=</sup> *<sup>x</sup>*{*p*,*q*+} <sup>⊕</sup> <sup>1</sup> <sup>∪</sup> *<sup>x</sup>*{*p*,*q*} <sup>⊕</sup> <sup>1</sup>, *xs* :<sup>=</sup> *<sup>x</sup>*{*p*,*q*+} <sup>⊕</sup> <sup>1</sup> <sup>∪</sup> *<sup>x</sup>*{*p*,*q*} <sup>∪</sup> <sup>1</sup>}. Note that the *xq* in *u*(*xs*) becomes *x*{*p*,*q*+} ⊕ 1, corresponding to the right part of the definition of *t* a (the postponed increment *xq*<sup>+</sup> is evaluated in *<sup>u</sup>*sh). Note that the r-term *<sup>x</sup>*{*p*,*q*} <sup>⊕</sup> 1 is in *Conflict* as *<sup>x</sup>*{*p*,*q*} is an r-term of *<sup>u</sup>*sh too. Therefore it is postponed in *<sup>u</sup>*+, i.e. *<sup>u</sup>*sh(*xr*) = *<sup>x</sup>*{*p*,*q*} <sup>⊕</sup>1∪··· becomes *<sup>u</sup>*+(*xr*<sup>+</sup> ) = *<sup>x</sup>*{*p*,*q*}. We get *<sup>u</sup>*<sup>+</sup> <sup>=</sup> {*xr* :=*x*{*p*,*q*+} <sup>⊕</sup>1,*xs* :=*x*{*p*,*q*+} <sup>⊕</sup> <sup>1</sup>∪*x*{*p*,*q*} <sup>∪</sup>1, *xr*<sup>+</sup> :=*x*{*p*,*<sup>q</sup>*}}. Finally, *<sup>u</sup>*<sup>a</sup> groups r-terms replicated in *<sup>u</sup>*<sup>+</sup> under a common l-value: *<sup>u</sup>*<sup>a</sup> <sup>=</sup> {*x*{*r*,*s*} :<sup>=</sup> *<sup>x</sup>*{*p*,*q*+} <sup>⊕</sup> <sup>1</sup>, *<sup>x</sup>*{*s*} :<sup>=</sup> <sup>1</sup>,*x*{*s*,*r*+} :<sup>=</sup> *<sup>x</sup>*{*p*,*q*}}. The next active counters are *Act* <sup>=</sup> {*x*{*r*,*s*}, *<sup>x</sup>*{*s*},*x*{*s*,*r*<sup>+</sup>}}. Note that, for *<sup>x</sup>*{*p*,*q*+}, the postponed increment at *<sup>p</sup>*<sup>+</sup> was synchronized on this transition, while the conflict at *x*{*p*,*q*} was solved by postponing increment and marking *<sup>r</sup>* with <sup>+</sup>. 

The algorithm either returns the CSA DCSA<sup>a</sup>(CA(*A*)), or detects an irresolvable counter replication, in which case DCSA<sup>a</sup>(CA(*A*)) does not exist.7 Let *m* = -*R* and recall that *n* denotes the length of the matched text, |*w*|. Since CA(*R*) has at most *m* states and *m*<sup>2</sup> transitions, a basic analysis of the algorithm's data structures reveals that the resulting CSA has at most 22*<sup>m</sup>* states, each with at most 2*m*<sup>2</sup> outgoing transitions, each transition of the size in *O*(*m*2*m*). Because DCSA<sup>a</sup>(CA(*A*)) encodes DCSA(CA(*A*)), it has the same language, and it also inherits its determinism. Since it does not replicate counters, it can be simulated in pattern matching fast, in time linear to the text and independent of the counter bounds. The following theorem is proved in [21].

Theorem 3. *For R with flat counting, if* DCSA<sup>a</sup>(CA(*R*)) *exists, then it does not replicate counters, its size is in O*(22*<sup>m</sup> m*)*, L*(CA(*R*)) = *L*(DCSA<sup>a</sup>(CA(*R*)))*, and it can be simulated on a word w of the length n in time O*(22*mmn*)*.*

Matching can be done in time of constructing the CSA plus its simulation, which in the sum is indeed fast, not dependent on *k* and linear in *n*. It can also be noted that the *m* in the exponents above is not the size of the entire regex, but only the size of the counted sub-regexes.

## 7 Regexes with Synchronizing Counting

Finally, in this section we define the class of regexes with synchronizing counting, which precisely captures when the CSA created by our construction in Section 6 does not replicate counters and hence allow fast matching (in the sense of Theorem 3).

Definition 7 (Regexes with synchronizing counting). *A regex has*synchronizing counting *iff it has no sub-expression S*{n,m} *where for some k* <sup>∈</sup> <sup>N</sup>*, a word from L*(*S*)*<sup>k</sup> has a prefix from L*(*S*)*k*+1*.*

For instance, (ac\*){1,4}(ab|ba){3,5}(a(ab)\*){2,8} is a regex with synchronizing counting as each word from *L*(ac\*)*<sup>k</sup>* must contain the symbol *a* exactly *k* times,

<sup>7</sup> Aborting the construction here simplifies the description, but it would also be possible to continue the construction and return a DCSA that does not guarantee fast simulation.

words from *L*(ab|ba)*<sup>k</sup>* must have exactly 2*k* symbols, and words from *L*(a(ab)\*)*<sup>k</sup>* can be uniquely split at the first *<sup>a</sup>* in the a(ab)\*. In comparison, (a|aa){2,5} does not have synchronizing counting as *a* · *a* · *a* is a prefix of *aa* · *aa*.

Intuitively, there is no pair of paths through CA(*S*{m,n}) starting at the same state, over the same word, ending in the same state, where the number of increments differs by two. In such case, DCSA<sup>a</sup>(CA(*S*{m,n})) would have to delay two increments, which our construction does not allow. The theorem below is proved in [21].

Theorem 4. *Given a regex R with flat counting, the algorithm of Section 6 returns* DCSA<sup>a</sup>(CA(*R*)) *if and only if R has synchronizing counting.*

Corollary 1. *Regexes with flat synchronizing counting have a fast matching algorithm.*

*Proof.* From Theorems 3 and 4.

*Counting with Markers.* Even though designing and recognizing synchronizing counting is usually intuitive, it may also be tricky. For instance, (\\\\d+\\\\.){3}, from the database of real-world regexes we use in our experiment, has synchronizing counting, while ICE Dims.{92}(( ?(X|\d+)){13}) does not.<sup>8</sup> A vast majority of real-world regexes we examined fortunately belong to very easily recognizable subclasses of synchronizing counting. The most wide-spread and easy to recognize are regexes with *letter-marked counting*, where every sub-expression *<sup>S</sup>*{m,n} has a set of marker letters such that every word from *L*(*S*) has exactly one occurrence of a marker letter. <sup>9</sup>

Marker letters may be generalized to *marker words*, though, markers that can arise by concatenation of several words from *L*(*S*) cannot be used. The condition that has to be satisfied is that any word from *<sup>L</sup>*(*S*)*k*, *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>, has exactly *<sup>k</sup>* non-overlapping occurrences of marker words as infixes. Another sufficient property of *S* is that it has words of a *uniform length*. The idea of markers may be generalized further until the point when the set of marker words is specified by general regexes, when we get precisely the synchronizing counting. The regexes with letter-marked counting are easily human as well as machine recognizable (see a simple *O*(|*R*| <sup>2</sup>)-time algorithm in [21]).

## 8 Practical Considerations

Although the main point of this work is the theoretical feasibility of fast matching with synchronizing counting, we will also argue that the results are of practical relevance. To this end, we show experimentally that synchronizing counting and marked counting cover a majority of practical regexes. We also give arguments that matching with the CSA constructed in Section 6 can be done efficiently.

<sup>8</sup> An automated way of identifying synchronizing counting would be running the CSA-to-DCSA

determinization from Section 6, but this is exponential to <sup>|</sup>*R*|. <sup>9</sup> That letter-marked counting is a strict superset of the class that is in [36] conjectured as handled by the algorithm of [36]. The conjecture of [36] is also not correct, as shown in [21].

#### 8.1 Occurrence of Synchronizing Counting in Practice

To substantiate the practical relevance of synchronizing counting regexes, we examined a large sample of practical regexes using a simple checker of letter-marked counting. The benchmark consists of over 540 000 regexes collected from (1) a large scale analysis of software projects [10]; (2) regexes used by network intrusion detection systems Snort [27], Bro [29], Sagan [34], and the academic papers [42,38]; (4) the RegExLib database of regexes [28].

From the regexes that we could parse10, 31 975 contained counting. We selected those with flat counting and with the sum of upper bounds of counters larger than 20 (as was done in [36] to filter out counting with small bounds that can be handled through counter unfolding and traditional methods)11. This left us with 5 751 regexes. From these, only 46 regexes (0.8%) have counting that is not letter-marked. Furthermore, we manually checked these regexes and we identified that 22 of them have synchronizing counting. We have therefore found only 24 regexes with non-synchronizing counting, i.e., 0.4 % of the examined set of regexes with flat counting.

The 24 non-synchronizing regexes are listed in [21]. Some of them may clearly be rewritten with synchronizing counting, such as (.+){25}(.\*), which can be rewritten as .{25,}(.\*). We speculate that some of them might in fact represent a mistake, such as (.\*){1,32000}[bc] where the counter matches the empty word, or (\n\s+)(criterion .\*\n)(\s.+){1,99} where the \s.+ might have been intended as \s\S+ (\s are white spaces, \S are all the other characters). Synchronizing counting seems to capture the intuition with which counting is often written, hence reporting non-synchronizing counting might help identifying bugs.

By the same methodology and from a nearly identical benchmark, [36] arrived to a sample of 5 000 regexes with flat counting with the sum of bounds larger than 20. The algorithm of [36] did not cover 571 regexes from the 5 000, which is 11 % of the examined set of regexes with flat counting (in contrast to the 0.4% with non-synchronizing counting and the 0.8% with counting that is not letter-marked, measured on a slightly larger set of regexes). The two sets of regexes with flat counting, the 5 751 of ours and the 5 000 of [36], are not perfectly identical, however. Differences are to a small degree caused by differences in the base database ([36] uses about 18 more regexes that are proprietary and excludes 26 regexes with counter bounds larger than 1 000), and to a larger degree by small differences in the parsers.

#### 8.2 Practical Efficiency of Matching with Synchronizing Counting

The size and the worst-case time of simulation of DCSA<sup>a</sup>(CA(*R*)) are still exponential to the number of states of CA(*R*) (namely, *O*(22*<sup>m</sup> m*) and *O*(2<sup>2</sup>*mmn*) where *m* = -*R* equals the number of states of CA(*R*), cf. Theorem 3). The potential problem is that the algorithm may generate at most 2*<sup>m</sup>* counters, and this potentially threatens practicality of our matching algorithm.

<sup>10</sup> We did not parse 38 558 regexes since their syntax was broken or contained some advanced features we do not support.

<sup>11</sup> 926 regexes contain nested counting and 25297 regexes contain small upper bounds.

First, it should be noted that the *m* in the exponent can be decreased from the size of the entire regex to the size of the counted sub-expression, which is usually very small. Then, although an efficient implementation is beyond the scope of this paper and we are leaving it as a future work, we give some indirect arguments for practicality of the CA-to-CSA algorithm.12

By the standard techniques of register allocation [1], it is possible to decrease the number of counters and counter assignments other than identity dramatically. In fact, simply eliminating needless renaming of counters and reusing the same name whenever possible, our algorithm creates CSA isomorphic to those of [36] when run on regexes handled by [36]. The work [36] already shows that simulating these CSA may be done efficiently and that it brings dramatic improvements over best matchers on countingintensive examples.

In our experience with hand-simulating the algorithm on practical examples, cases not handled by [36] do not behave much differently, and the numbers of CSA counters do not have a strong tendency to explode.

## 9 Conclusions

We have extended the regex matching algorithm of [36] and shown that the extended version allows fast pattern matching of so-called synchronising regexes, a class of regexes that we have newly introduced. The class of synchronising regexes significantly extends all previously known classes of regexes that allow fast matching and covers a majority of regexes appearing in practice (wrt. our empirical study).

In the future, we plan to study extensions of the presented techniques to regexes with nested counting (non-flat). This will probably require a more sophisticated alternative of the offset-list data structure for sets, capable of storing relations of numbers. An interesting question is also how and when regexes can be rewritten to a synchronizing form and for what cost.

## Acknowledgment

This work has been supported by the Czech Ministry of Education, Youth and Sports project LL1908 of the ERC.CZ programme, the Czech Science Foundation project 23- 06506S, and the FIT BUT internal project FIT-S-23-8151.

## References

1. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley (August 2006), http://www.amazon.ca/exec/obidos/ redirect?tag=citeulike09-20&path=ASIN/0321486811

<sup>12</sup> A competitive matcher that runs on real-world regexes requires an extensive infrastructure, optimized data structures for the shared registers, and ideally an on-the-fly version of the CAto-CSA determinization (similar to the online DFA simulation).

410 L. Holík et al.


412 L. Holík et al.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Compositional Learning for Interleaving Parallel Automata**

Faezeh Labbaf1() , Jan Friso Groote<sup>2</sup> , Hossein Hojjat<sup>1</sup>,<sup>3</sup> , and Mohammad Reza Mousavi<sup>4</sup>

<sup>1</sup> Tehran Institute for Advanced Studies (TeIAS), Khatam University, Tehran, Iran f.labaf@khatam.ac.ir <sup>2</sup> Eindhoven University of Technology, Eindhoven, The Netherlands j.f.Groote@tue.nl <sup>3</sup> University of Tehran, Tehran, Iran hojjat@ut.ac.ir <sup>4</sup> King's College London, London, UK mohammad.mousavi@kcl.ac.uk

**Abstract.** Active automata learning has been a successful technique to learn the behaviour of state-based systems by interacting with them through queries. In this paper, we develop a compositional algorithm for active automata learning in which systems comprising interleaving parallel components are learned compositionally. Our algorithm automatically learns the structure of systems while learning the behaviour of the components. We prove that our approach is sound and that it learns a maximal set of interleaving parallel components. We empirically evaluate the effectiveness of our approach and show that our approach requires significantly fewer numbers of input symbols and resets while learning systems. Our empirical evaluation is based on a large number of subject systems obtained from a case study in the automotive domain.

## **1 Introduction**

Active automata learning has been successfully used to learn models of complex industrial systems such as communication- and security protocols [11], biometric passports [2], smart cards [1], large-scale printing machines [33], and lithography machines for integrated circuits [32,15]; we refer to the recent survey by Howar and Steffen on the practical applications of active automata learning [16]. Throughout these applications of automata learning, scalability issues have been pointed out [32,15]. It has also been suggested that compositional learning, i.e., learning a system through learning its components, is a promising approach to tame the complexity of learning [10,12].

Some early attempts have been recently made in learning structured models of systems [27,10] (we refer to the Related Work for an in-depth analysis). For example, the approach proposed by al-Duhaiby and Groote [10] decomposes the learning process into learning its parallel components; however, it relies on a deep knowledge of the system under learning, and the intricate interaction

Fig. 1: (a) Initial system with two concurrent FSMs (b) Partition the input alphabet to 4 elements and learn each component individually (c) Use the counterexample ab to merge two components

of the various actions being learned. In this paper, we propose an approach based on Dana Angluin's celebrated L<sup>∗</sup> algorithm [6], to learn the components of a system featuring an interleaving parallel composition. Our approach, called CL∗, does not assume any pre-knowledge of the structure and the alphabet of these components; instead, we learn this information automatically and onthe-fly, while providing a rigorous guarantee of the learned information. This is particularly relevant in the context of legacy and black-box systems where architectural discovery is challenging [8,22].

The gist of our approach is to learn the System Under Learning (SUL) in separate components with disjoint alphabets. We start with a partition comprising only singleton sets. The interleaving parallel composition of the components gives us the total behavior of the system. We pass the result to the teacher, and by exploiting the counter-examples returned, we iteratively merge the alphabet of the individual components.

**Example.** Figure 1(a) shows an example of two parallel Finite State Machines (FSMs) over the input alphabet {a, b, c, d} and output alphabet {0, 1}. We start by partitioning the alphabet into disjoint singleton sets of elements. The parallel composition of the 4 learned FSMs of Figure 1(b) does not comply with the original system, and the teacher may return the counter-example ab. The string ab generates the output sequence 10 in (a) but the output sequence in (b) is 11. The counter-example suggests to merge the sets {a} and {b} and restart the learning process which leads to the FSMs in Figure 1(c). One further merging step results in learning the original system. We provide a theoretical proof of correctness of this compositional construction, meaning that it is guaranteed to construct a correct system.

To study the effectiveness of our approach in practice, we designed an empirical experiment to investigate the following two research questions:

**RQ1** Does CL<sup>∗</sup> require fewer resets, compared to L∗? **RQ2** Does CL<sup>∗</sup> require fewer input symbols, compared to L∗?

Our research questions are motivated by the following facts: 1) Resets are a major contributing factor in learning practical systems as they are immensely time- and resource consuming [31]. Hence, reducing the number of resets can have a significant impact in the learning process. 2) The total number of symbols used in interacting with the system under learning provides us with a total measure of cost for the learning process and hence, reducing the total cost is a fair indicator of improved efficiency [36,9].

To answer these questions, we use a benchmark based an industrial automotive system. We design a number of experiments on learning various combinations of components in this system, gather empirical data, and analyse them through statistical hypothesis testing. Our results indicate that our compositional approach significantly improves the efficiency of learning compared to the monolithic L<sup>∗</sup> learning algorithm. The implementation of the algorithm, experiments, and their results can be found on-line in our lab package [23] (https://github.com/faezeh-lbf/CL-Star).

The remainder of this paper is organised as follows. In Section 2, we review the related work and position our research with respect to the state of the art. In Section 3, we present the preliminary definitions that are used throughout the rest of the paper. In Section 4, we present our algorithm and its proof of correctness and termination. We evaluate our algorithm on a benchmark from the automotive domain in Section 5. We conclude the paper and present the directions of our ongoing and future research in Section 6.

## **2 Related Work**

Active automata learning is a technique used to find the underlying model of a black box system by posing queries and building a hypothesis in an iterative manner. There is substantial early work in this domain, e.g., under the name system identification or grammar inference; we refer to the accessible introduction by Vaandrager [36] for more information. A seminal work in this domain is the L<sup>∗</sup> algorithm by Dana Angluin [6], which comes with theoretical complexity bounds for the learning process using a representation called the "Minimally Adequate Teacher" (MAT).

MAT hypothesises a teacher that is capable of responding to membership queries (MQs) and equivalence queries (EQs); the former checks the outcome of a sequence of inputs (e.g., with their respective outputs, or with their membership in the language of the automaton) and the latter checks whether a hypothesised automaton is equivalent to the system under learning. Our work replaces a single MAT with multiple MATs that can potentially run in parallel and learn different components of the black-box system automatically.

Learning structured systems and in particular, compositional learning of parallel systems has been studied recently in the literature. Moerman [27] proposes an algorithm to learn parallel interleaving Moore machines. Our algorithm differs from Moerman's algorithm in that in the parallel composition of Moore machines, the output of each individual component is explicitly specified, because the output of the system is specified as a tuple of the outputs of its components. In other words, the underlying structure is immediately exposed by considering the type of outputs produced by the system under learning. However, in our approach, we need to identify the components and assign outputs to them on-the-fly since the decomposition is not explicit in parallel composition. Al-Duhaiby and Groote [10] learn parallel labelled transitions systems with the possibility of synchronisation among them. In order to develop their algorithm, they assume a priori knowledge of mutual dependencies among actions in terms of a confluence relation. This type of information is difficult to obtain and the domain knowledge in this regard may be error prone. Particularly for legacy and large black-box systems (e.g., binary code), architectural discovery has proven challenging [8,22]. We address this challenge and go beyond the existing approaches by learning about confluence of actions on-the-fly through observing the minimal counter-examples generated by the MAT(s).

Frohme and Steffen [12] introduce a compositional learning approach for Systems of Procedural Automata [13]; these are collections of DFAs that may "call" each, akin to the way non-terminals may be used in defining other nonterminals in a grammar. Their approach is essentially different from ours in that the calls across automata are assumed to be observable and hence the general structure is assumed to be known; in our approach, we learn the structure by observing implicit dependencies among the learned automata through analysing counter-examples. Also their approach is aimed at a richer and more expressive type of systems, namely pushdown systems, which justifies the requirement for additional information.

L<sup>∗</sup> has been improved significantly in the past few years; the major improvements upon L<sup>∗</sup> can be broadly categorised into three categories: 1) improving the data structures used to store and retrieve the learned information [21,31,19,37]; 2) improving the way counter-examples are processed in refining the hypothesis [31,28,3,17]; 3) learning more expressive models, such as register- [18,14] and timed automata [34,5]. This third category of improvements is orthogonal to our contribution and extension of our approach can be considered in those contexts as well.

Two notable recent improvements, in the first two categories, are L# [37] and L<sup>λ</sup> [17], respectively. L# uses the notion of apartness to organise and maintain a tree-shaped data-structure about the learned automaton. L<sup>λ</sup> uses a searchbased method to incorporate the information about the counter-example into the learned hypothesis. The improvements brought about by L<sup>λ</sup> can be readily incorporated into our approach, particularly since our approach relies on finding minimal counter-examples. Integrating our approach into L# requires a more careful consideration of maintaining and composing tree-shaped data structures when detecting dependencies. We expect that both of these combinations will further improve the efficiency of our proposed method.

# **3 Preliminaries**

In this section, we review the basic notions used throughout the remainder of the paper. We start by formalising the notion of a finite state machine, which is the underlying model of the system under learning and move on to parallel composition and decomposition (called projection) as well as the concept of (in)dependent actions, which are essential in identifying the parallel components. Finally, we conclude this section by recalling the basic concepts of active automata learning and the L<sup>∗</sup> algorithm.

### **3.1 Finite State Machines (FSMs)**

Finite state machines (also called Mealy machines), defined below, are straightforward generalisations of finite automata in which the transitions produce outputs (rather than only indicating acceptance or non-acceptance):

**Definition 1.** (Finite State Machine) A Finite State Machine (FSM) M is a sixtuple (S, s<sup>0</sup>, I, O, δ, λ) where :


An FSM starts in the initial state <sup>s</sup><sup>0</sup> and accepts a word (a sequence of actions of its input alphabet) in order to produce an equally-sized sequence of outputs. State transition- δ and output function λ determine the next state and the output of an FSM upon receiving a single input. For each s, s <sup>∈</sup> <sup>S</sup>, <sup>i</sup> <sup>∈</sup> <sup>I</sup>, and o <sup>∈</sup> O, we write s i/o −−→ s when δ(s, i) = s and λ(s, i) = o.

State transitions are extended inductively from a single input i <sup>∈</sup> I, to a sequence of inputs <sup>w</sup> <sup>∈</sup> <sup>I</sup><sup>∗</sup>, i.e., we define δ(s, ) = s and λ(s, ) = where is the empty sequence; and for s <sup>∈</sup> S, w <sup>∈</sup> I<sup>∗</sup>, and <sup>a</sup> <sup>∈</sup> <sup>I</sup>, we have <sup>δ</sup>(s, wa) = δ(δ(s, wa), a) and λ(s, wa) = λ(s, w)λ(δ(s, w), a), where juxtaposition of sequences denotes concatenation. For the sake of conciseness, we write δ(w) and λ(w) instead of δ(s<sup>0</sup>, w) and <sup>λ</sup>(s<sup>0</sup>, w).

In much of the literature in active learning, the system under learning is assumed to be complete and deterministic and we follow this common assumption in Definition 1 by requiring the state transition and output relations to be total functions. While the determinism assumption is essential for our forthcoming results to hold, we expect that the existing recipes for learning non-deterministic state machines can be made compositional using a similar approach as ours.

### **3.2 (De)Composing FSMs**

Our aim is to produce a compositional learning algorithm for systems composed of interleaving parallel components, defined below. Due to the interleaving nature of parallel composition and determinism of the system under learning, the alphabets of these components are assumed to be disjoint.

**Definition 2.** (Interleaving Parallel Composition) For two FSMs M<sup>i</sup> = (Si, s<sup>0</sup><sup>i</sup> , Ii, Oi, δi, λi), with <sup>i</sup> ∈ {0, <sup>1</sup>}, where <sup>I</sup><sup>0</sup> <sup>∩</sup> <sup>I</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup>, the interleaving parallel composition of <sup>M</sup><sup>0</sup> and <sup>M</sup>1, denoted by <sup>M</sup><sup>0</sup> || <sup>M</sup>1, is an FSM defined as

(S<sup>0</sup> × S1,(s<sup>0</sup><sup>0</sup> , s<sup>0</sup><sup>1</sup> ), I<sup>0</sup> ∪ I1, O<sup>0</sup> ∪ O1, δ, λ)

where δ and λ are defined by

<sup>δ</sup>((s0, s1), a) = - (δ0(s0, a), s1) if <sup>a</sup> <sup>∈</sup> <sup>I</sup>0, (s0, δ1(s1, a)) otherwise, and <sup>λ</sup>((s0, s1), a) = - <sup>λ</sup>0(s0, a) if <sup>a</sup> <sup>∈</sup> <sup>I</sup>0, λ1(s1, a) otherwise.

For <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup>0, <sup>s</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup>1, and <sup>a</sup> <sup>∈</sup> <sup>I</sup><sup>0</sup> <sup>∪</sup> <sup>I</sup><sup>1</sup>

Next, we define the notions of projections for FSMs and for words; these notions are further used in the notion of (in)dependence and eventually in our proof of correctness to establish that the composed system has the same behaviour as the composition of the learned components.

**Definition 3.** (Projection of an FSM) The projection of an FSM M = (S, s0, I, O, δ, λ) on a set of inputs <sup>I</sup> <sup>⊆</sup> <sup>I</sup> denoted by <sup>P</sup>(M, I ), is an FSM (S, s0, I , O , δ , λ ), where

**–** δ (s, a) = <sup>δ</sup>(s, a) for <sup>a</sup> <sup>∈</sup> <sup>I</sup> , **–** λ (s, a) = <sup>λ</sup>(s, a) for <sup>a</sup> <sup>∈</sup> <sup>I</sup> , and **–** O = {o ∈ O | ∃a ∈ I . <sup>∃</sup><sup>s</sup> <sup>∈</sup> S. λ(s, a) = <sup>o</sup>}.

**Definition 4.** (Projection of <sup>a</sup> word) The projection of a word <sup>w</sup> <sup>∈</sup> <sup>I</sup><sup>∗</sup> on a set of inputs <sup>I</sup> <sup>⊆</sup> <sup>I</sup>, denoted by <sup>P</sup><sup>I</sup>-(w), is inductively defined as follows:

$$\begin{aligned} P\_{I'}(\epsilon) &:= \epsilon, \\ P\_{I'}(au) &:= \begin{cases} aP\_{I'}(u) \text{ if } a \in I', \\ P\_{I'}(u) \quad \text{otherwise.} \end{cases} \end{aligned}$$

**Definition 5.** (Projection of an output sequence) The projection of the output sequence w = o<sup>1</sup> ...o<sup>n</sup> with respect to an equally-sized sequence of inputs v = <sup>i</sup>1,...,i<sup>n</sup> <sup>∈</sup> <sup>I</sup><sup>∗</sup> and a subset of inputs <sup>I</sup> <sup>⊆</sup> <sup>I</sup>, denoted by <sup>P</sup><sup>I</sup>- (w, v), is defined as follows:

$$\begin{aligned} P\_{I'}(\epsilon, \epsilon) &:= \epsilon, \\ P\_{I'}(ow, av) &:= \begin{cases} oP\_{I'}(w, v) \text{ if } a \in I', \\ P\_{I'}(w, v) \text{ } otherwise. \end{cases} \end{aligned}$$

**Definition 6.** ((In)Dependent Actions) Consider an FSM M with a set of inputs <sup>I</sup>. The subsets <sup>I</sup><sup>0</sup>, ..., I<sup>n</sup> <sup>⊆</sup> <sup>I</sup> form an independent partition of <sup>I</sup> when for any <sup>u</sup> <sup>∈</sup> <sup>I</sup><sup>∗</sup>, <sup>λ</sup><sup>P</sup> (M,I0)||...||<sup>P</sup> (M,In)(u) = <sup>λ</sup><sup>M</sup>(u). Two inputs <sup>i</sup><sup>0</sup>, i<sup>1</sup> <sup>∈</sup> <sup>I</sup> are independent when they belong to two distinct subsets of an independent partition. Two input actions are dependent, when they are not independent.

**Example.** The partition - {a}, {b}, {c, d} in Figure 1(a) is not an independent partition because <sup>λ</sup><sup>M</sup>(ab) = 10 but <sup>λ</sup><sup>P</sup> (M,{a})||<sup>P</sup> (M,{b})||<sup>P</sup> (M,{c,d})(ab) = 11.

It immediately follows from Definition 6 and associativity of parallel composition (with respect to trace equivalence) that any coarser partitioning based on an independent partition is also an independent partitioning; this is formalised in the following corollary.

**Corollary 1.** By combining two or more sets of an independent partition, the resulting partition remains independent.

Moreover, it holds that any smaller subset of an independent partitioning is also an independent partitioning of the original state machine projected on the alphabet of the smaller subset, as specified and proven below.

**Lemma 1.** Consider an independent partition <sup>I</sup><sup>0</sup>,...,I<sup>n</sup> of inputs <sup>I</sup> for an FSM <sup>M</sup>; then for <sup>K</sup> ⊆ {0,...,n}, {I<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>K</sup>} is an independent partition for P(M, <sup>i</sup>∈<sup>K</sup>(I<sup>i</sup>)).

Proof. Consider any subset <sup>K</sup> ⊆ {0,...,n} and {I<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>K</sup>} and consider any input sequence u <sup>∈</sup> ( <sup>i</sup>∈<sup>K</sup> <sup>I</sup><sup>i</sup>)∗. Since <sup>u</sup> does not contain a symbol that is in any <sup>I</sup><sup>j</sup> for j /<sup>∈</sup> <sup>K</sup>, we have that <sup>λ</sup>||i∈K<sup>P</sup> (M,Ii)(u) = <sup>λ</sup><sup>P</sup> (M,I0)||...||<sup>P</sup> (M,In)(u). Since <sup>I</sup><sup>0</sup>,...,I<sup>n</sup> are independent, it follows likewise that <sup>λ</sup><sup>P</sup> (M,I0)||...||<sup>P</sup> (M,In)(u) = <sup>λ</sup><sup>M</sup>(u). Using again that <sup>u</sup> has no symbol in any <sup>I</sup><sup>j</sup> for j /<sup>∈</sup> <sup>K</sup>, we know that <sup>λ</sup><sup>M</sup>(u) = <sup>λ</sup><sup>P</sup> (M,- <sup>i</sup>∈K(Ii))(u). Hence, <sup>λ</sup>||i∈K<sup>P</sup> (M,Ii)(u) = <sup>λ</sup><sup>P</sup> (M,- <sup>i</sup>∈K(Ii))(u), which was to be shown.

**Lemma 2.** For any independent partition <sup>I</sup><sup>0</sup>,...,I<sup>n</sup> <sup>⊆</sup> <sup>I</sup>, <sup>w</sup> <sup>∈</sup> <sup>I</sup><sup>∗</sup> and <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, and state <sup>s</sup> it holds that <sup>P</sup><sup>I</sup><sup>i</sup> (λ<sup>M</sup>(s, w), w) = <sup>λ</sup><sup>P</sup> (M,Ii)(s, P<sup>I</sup><sup>i</sup> (w)).


Proof. The proof uses induction on the length of w. Instead of proving the thesis, we prove the following stronger statement, which is possible because M can be viewed as the parallel construction of independent components.

$$P\_{I\_i}(\lambda\_M((s\_0, \dots, s\_n), w), w) = \lambda\_{P(M, I\_i)}((s'\_0, \dots, s'\_n), P\_{I\_i}(w)) \text{ with } s\_i = s'\_i.$$

Note that the lemma directly follows from this. Below we write *<sup>s</sup>* for <sup>s</sup><sup>0</sup>,...,s<sup>n</sup>, and likewise for *s* and *s*.

The base case (|w<sup>|</sup> = 0) holds trivially as w <sup>=</sup> . For the induction step we assume that the induction hypothesis holds for <sup>|</sup>w<sup>|</sup> <sup>=</sup> k and we show that it holds for <sup>w</sup> <sup>=</sup> aw for arbitrary a <sup>∈</sup> I.

We first consider the case where a /<sup>∈</sup> <sup>I</sup>i. We derive

$$\begin{aligned} P\_{I\_i}(\lambda\_M(\mathbf{s}, aw), aw) &= P\_{I\_i}(\lambda\_M(\mathbf{s}, a)\lambda\_M(\delta(\mathbf{s}, a), w), aw) & \text{Definition 1} \\ &= P\_{I\_i}(\lambda\_M(\delta(\mathbf{s}, a), w), w) & \text{Definition 5.} \\ &= \lambda\_{P(M, I\_i)}(\mathbf{s'}, P\_{I\_i}(w)) & \text{Induction hypothesis.} \\ &= \lambda\_{P(M, I\_i)}(\mathbf{s''}, P\_{I\_i}(aw)) & \text{Definition 4.} \end{aligned}$$

By construction the <sup>i</sup>-th state in <sup>δ</sup>(*s*, a) is equal to <sup>s</sup>i as a /<sup>∈</sup> <sup>I</sup>i. Hence, using the induction hypothesis, *s*- i <sup>=</sup> <sup>s</sup>i. By definition *<sup>s</sup>*- = δ(*s*--, a) and hence, *s*-- i <sup>=</sup> *<sup>s</sup>*- i <sup>=</sup> *<sup>s</sup>*<sup>i</sup> as we had to show.

The other case we must consider is <sup>a</sup> <sup>∈</sup> <sup>I</sup>i. Again the derivation is straightforward.

$$\begin{split} P\_{I\_i}(\lambda\_M(\mathbf{s}, a\mathbf{w}), a\mathbf{w}) &= P\_{I\_i}(\lambda\_M(\mathbf{s}, a)\lambda\_M(\delta(\mathbf{s}, a), w), a\mathbf{w}) & \text{Definition 1} \\ &= \lambda\_M(\mathbf{s}, a)P\_{I\_i}(\lambda\_M(\delta(\mathbf{s}, a), w), w) & \text{Definition 5.} \\ &= \lambda\_M(\mathbf{s'}, a)\lambda\_{P(M, I\_i)}(\delta(\mathbf{s'}, a), P\_{I\_i}(w)) & \text{Induction hypothesis.} \\ &= \lambda\_{P(M, I\_i)}(\mathbf{s}, P\_{I\_i}(aw)) & \text{Definition 4.} \end{split}$$

Using the induction hypothesis it follows that <sup>s</sup>i <sup>=</sup> <sup>s</sup>- i, which concludes the proof. -

## **3.3 Model Learning**

Active model learning, introduced by Dana Angluin, was originally designed to formulate a hypothesis H about the behavior of a System Under Learning (SUL) as an FSM. Model learning is often described in terms of the Minimally Adequate Teacher (MAT). In the MAT framework, there are two phases: (i) hypothesis construction, where a learning algorithm poses Membership Queries (MQ) to gain knowledge about the SUL using reset operations and input sequences; and (ii) hypothesis validation, where based on the model learned so far, the learner proposes a hypothesis H about the "language" of the SUL and asks Equivalence Queries (EQ) to test it. The results of the queries are organised in an observation table. The table is iteratively refined and is used to formulate H .

**Definition 7.** (Observation Table) An observation table is a triple (S, E, T), where <sup>S</sup> <sup>⊆</sup> <sup>I</sup><sup>∗</sup> is a prefix-closed set of input strings (i.e., prefixes); <sup>E</sup> <sup>⊆</sup> <sup>I</sup><sup>+</sup> is a suffix-closed set of input strings (i.e., suffixes); and T is a table where rows are labeled by elements from S ∪(S.I), columns are labeled by elements from E, such that for all pre ∈ S ∪ (S.I) and suf ∈ E, T(pre, suf) is the SUL's output suffix of size |suf| for the input sequence pre.suf.

The L<sup>∗</sup> algorithm initially starts with S only containing the empty word , and E equals set of inputs alphabet I. Two crucial properties of the observation table, closedness and consistency, defined below, allow for the construction of a hypothesis.

**Definition 8.** (Closedness Property) An observation table is closed iff for all w <sup>∈</sup> S.I there is a w- <sup>∈</sup> S that for all suf <sup>∈</sup> E, T(w, suf) = T(w- , suf) holds.

**Definition 9.** (Consistency Property) An observation table is consistent iff for all pre<sup>1</sup>, pre<sup>2</sup> <sup>∈</sup> <sup>S</sup>, if for all suf <sup>∈</sup> <sup>E</sup>, <sup>T</sup>(pre<sup>1</sup>, suf) = <sup>T</sup>(pre<sup>2</sup>, suf), it holds that T(pre<sup>1</sup>.α, suf) = T(pre<sup>2</sup>.α, suf) for all α <sup>∈</sup> I, suf <sup>∈</sup> E.

MQs are posed until these two properties hold, and once they do, a hypothesis H is formulated. After formulating H , L<sup>∗</sup> works under the assumption that an EQ can return either a counter-example (CE) exposing the non-conformance, or yes, if H is indeed equivalent to the SUL. When a CE is found, a CE processing method adds prefixes and/or suffixes to the observation table and hence refines H . The aforementioned steps are repeated until EQ confirms that H and SUL are the same. In between MQs, we often need to bring the FSM back to a known state; this is done through reset operations, which are one of our metrics for measuring the efficiency of the algorithm. EQs are posed by running a large number of test-cases and hence they are (two- to three) orders of magnitude larger than MQs. These test cases are generated through a randomwalk of the graph or through a deterministic algorithm that tests all states and transitions for a given fault model. Two examples of deterministic test-case generation algorithms are the W- and WP-method [7]. It appears from recent empirical evaluations that for realistic systems deterministic equivalence queries are not efficient [4].

Since we are going to be learning the system in terms of components with disjoint alphabets, we define the following projection operator that removes all the transitions that are not in the projected alphabet. Our compositional learning algorithm basically learns a black-box with respect to its projection on the actions available in each purported component.

**Definition 10.** (L<sup>∗</sup> with projected alphabet) Given an SUL <sup>M</sup> = (S, s<sup>0</sup>, I, O, δ, λ) and I- <sup>⊂</sup> I, L<sup>∗</sup>(M, I- ) returns P(M, I- ) by running algorithm L<sup>∗</sup> with projected alphabet Ion M.

## **4 Compositional Active Learning**

In this section, we present an algorithm that learns the SUL in separate components and uses the interleaving parallel composition of the learned components to reach the total behavior of the system. Each component has an input alphabet I<sup>i</sup>, which is disjoint from the alphabet of all the other components. The set of the input alphabets of components I<sup>F</sup> <sup>=</sup> {I<sup>1</sup>, ..., I<sup>n</sup>} is a partition of the total system's input alphabet. The main idea is to find an independent partitioning I<sup>F</sup> . To reach such a partitioning, we start with a partition with singleton sets and iteratively merge those sets that are found to be dependent on each other. Then for <sup>I</sup><sup>i</sup> <sup>∈</sup> <sup>I</sup><sup>F</sup> , we learn the SUL with the projected alphabet <sup>I</sup><sup>i</sup>, and compute the product of the obtained components with interleaving parallel composition. The result is equivalent to the SUL if I<sup>F</sup> is an independent partition.

### **Algorithm 1:** Compositional Learning Algorithm (CL∗)

**Result:** H **Input:** <sup>I</sup><sup>F</sup> <sup>=</sup> {I1,...,In}, M <sup>H</sup> <sup>←</sup> LearnInP arts(M, I<sup>F</sup> ) eq ← Equivalence-Query(H , M) **while** eq = yes **do <sup>5</sup>** CE ← eq <sup>D</sup> <sup>←</sup> InvolvedSets(CE, I<sup>F</sup> ) <sup>I</sup><sup>F</sup> <sup>←</sup> Composition(I<sup>F</sup> , D) <sup>H</sup> <sup>←</sup> LearnInP arts(M, I<sup>F</sup> ) eq ← Equivalence-Query(H , M) **10 end** return H , I<sup>F</sup>

**Definition 11.** (LearnInParts) The LearnInParts function gets M = (S, s0, I, O, δ, λ) and the partition <sup>I</sup><sup>F</sup> <sup>=</sup> {I1, ..., I<sup>n</sup>} of <sup>I</sup> and returns the interleaving parallel composition of the learned components.

$$LearInParts(M, I^F) = L^\*(M, I\_1) \parallel \dots \parallel L^\*(M, I\_n).$$

**Definition 12.** (Composition) Given a partition <sup>I</sup><sup>F</sup> <sup>=</sup> {I1, ..., I<sup>n</sup>} and <sup>D</sup> <sup>⊆</sup> {1,...,n}, the Composition of <sup>I</sup><sup>F</sup> over <sup>D</sup> merges all the <sup>I</sup><sup>i</sup> (<sup>i</sup> <sup>∈</sup> <sup>D</sup>) in <sup>I</sup><sup>F</sup> .

$$Compposition(I^F, D) = (I^F \backslash \{I\_i | i \in D\}) \cup \{\bigcup\_{i \in D} I\_i\}.$$

Example. If <sup>I</sup><sup>F</sup> <sup>=</sup> {{a}, {b}, {c}, {d}} and <sup>D</sup> <sup>=</sup> {1, <sup>3</sup>, <sup>4</sup>}, then Composition(<sup>I</sup> <sup>F</sup> , <sup>D</sup>) = {{a, c, d}, {b}}.

**Definition 13.** (InvolvedSets) The function InvolvedSets gets a counter-example CE and a partition <sup>I</sup><sup>F</sup> <sup>=</sup> {I1, ..., I<sup>n</sup>} and returns indices of the sets in <sup>I</sup><sup>F</sup> that contains at least one character of CE:

$$InvolvedSets(\mathsf{CE}, I^F) = \{ j \mid I\_j \in I^F, \ \exists i \ \mathsf{CE}[i] \in I\_j \},$$

where the i th character of CE is denoted by by CE[i].

The function InvolvedSets allows us to detect some dependent sets by using a minimal counter-example since all actions in the counter-example are dependent, as we prove in Theorem 2.

Algorithm 1 shows the pseudo-code of the compositional learning algorithm. Initially the algorithm is called with the singleton partitioning I<sup>F</sup> of the alphabet I and the SUL M, i.e., if the input alphabet is I = {a1, a2,...,a<sup>n</sup>}, then the initial partition of the alphabet will be <sup>I</sup><sup>F</sup> <sup>=</sup> {{a1}, {a2},..., {a<sup>n</sup>}}. The LearnInParts method on line 2 learns each of the components given the corresponding alphabet set using the algorithm L<sup>∗</sup> and returns the interleaving parallel composition of the learned components. If the oracle (MAT) returns yes for the equivalence query regarding hypothesis H , the algorithm terminates and returns H . Otherwise an(other) iteration of the loop is performed. The *InvolvedSets* method in line 6 extracts the dependent sets from the counterexample returned by the oracle; subsequently, *Composition* merges those sets into one. The *LearnInParts* method in line 8 is run again and the loop continues until the correct hypothesis is learned. We assume that the oracle always returns a minimal counter-example; this assumption is used in the proof of soundness (Theorem 2).

#### **4.1 Termination Analysis**

To prove the termination of our algorithm, we start with the following lemma which indicates how the counter-example is used to merge the partitions.

**Lemma 3.** *Let* <sup>I</sup><sup>F</sup> <sup>=</sup> {I1,...,Im} *be a partition of the system's input alphabet. If the teacher responds with a counter-example* CE*, then there are at least two actions* <sup>u</sup> <sup>∈</sup> <sup>I</sup>i, v <sup>∈</sup> <sup>I</sup><sup>j</sup> *in* CE *such that* <sup>I</sup><sup>i</sup> <sup>=</sup> <sup>I</sup><sup>j</sup> <sup>∧</sup> <sup>I</sup>i, I<sup>j</sup> <sup>∈</sup> <sup>I</sup><sup>F</sup> *.*

*Proof.* We prove this by contradiction. Suppose CE consists of actions that all belong to Ii. Let C<sup>i</sup> = L∗(M, Ii) with output function λ<sup>C</sup><sup>i</sup> . Since the output of L<sup>∗</sup> is always the correctly learned FSM of the SUL, λM(CE) = λ<sup>C</sup><sup>i</sup> (CE). Also, since C<sup>i</sup> is a component of H produced by LearnInP arts, λ<sup>H</sup> (CE) = λ<sup>C</sup><sup>i</sup> (CE) based on Definition 2. This means CE can not be a counter-example. -

The next lemma uses Lemma 3 to show how counter-examples will ensure progress in the algorithm, eventually guaranteeing termination.

**Lemma 4.** *At each round of the algorithm CL*∗*,* <sup>|</sup>I<sup>F</sup> <sup>|</sup> *decreases by at least 1.*

*Proof.* By Lemma 3, at each round of the algorithm, at least two dependent sets are found by InvolvedSets, and the algorithm merges these dependent sets into a single set. Thus the size of the partition decrements by at least one; hence, the lemma follows. -

Now we have the necessary ingredients to prove termination below.

**Theorem 1.** *The Compositional Learning Algorithm terminates.*

*Proof.* Assume, towards contradiction, that the algorithm does not terminate. Let I be the alphabet, an I<sup>F</sup> <sup>k</sup> be the partition of <sup>I</sup> after the <sup>k</sup>th round of the algorithm. By Lemma 4, after at least <sup>k</sup> <sup>=</sup> <sup>|</sup>I| − 1 rounds, <sup>|</sup>I<sup>F</sup> <sup>k</sup> | = 1. Also by the assumption, the algorithm has not terminated at round k. Since I<sup>F</sup> <sup>k</sup> <sup>=</sup> <sup>I</sup>, the algorithm reduces to algorithm L<sup>∗</sup> which terminates. Hence, the contradiction. -

We prove next that every time we merge two partitions, there is a sound reason (i.e., dependency of actions) for it.

**Theorem 2.** *Let* CE *be the minimal counter-example returned by the oracle at round* <sup>k</sup> *of the algorithm and* <sup>I</sup><sup>F</sup> <sup>=</sup> {I1,...,In} *the partition of the alphabet at the same round. Then, all actions in* CE *are dependent.*

Proof. Let CE = wa, w ∈ I<sup>∗</sup> and a ∈ I, and d = {d1,...,dm} be an independent partition for the SUL M. Assume some actions in w are independent from a (proof by contradiction). Let d<sup>k</sup> be the set in d that includes a. The set I \ d<sup>k</sup> contains all the independent actions from a. For M, we define O<sup>M</sup> = P<sup>d</sup><sup>k</sup> (λM(wa)); according to Lemma 2, O<sup>M</sup> = λ<sup>P</sup> (M,dk)(P<sup>d</sup><sup>k</sup> (wa)). The algorithm makes the hypothesis H = P(M, I1)|| ... ||P(M, In) at the current round k. Since d<sup>k</sup> is the union of a subset of I<sup>F</sup> (algorithm has not terminated yet), O<sup>H</sup> = P<sup>d</sup><sup>k</sup> (λ<sup>H</sup> (wa)) = λ<sup>P</sup> (<sup>H</sup> ,dk)(P<sup>d</sup><sup>k</sup> (wa)). If O<sup>H</sup> = OM, then P<sup>d</sup><sup>k</sup> (wa) is a smaller counter-example than wa, which is a contradiction. Otherwise if O<sup>H</sup> = OM, given that wa is a counter-example, P<sup>I</sup>\d<sup>k</sup> (λM(wa)) = P<sup>I</sup>\d<sup>k</sup> (λ<sup>H</sup> (wa)); if so, <sup>P</sup><sup>I</sup>\d<sup>k</sup> (wa) is a smaller counter-example, hence the contradiction. -

By Theorems 2 and 1, we have shown that the algorithm detects the independent action sets and eventually terminates. The next theorem is formulated to show that it terminates as soon as all dependent action sets have been detected.

**Theorem 3.** Let <sup>I</sup><sup>F</sup> <sup>=</sup> {I1,...,In} be an independent partition of the alphabet at round k. The algorithm terminates in this round.

Proof. We prove this by contradiction. Assume that the algorithm does not terminate, and CE is the minimal counter-example returned by the oracle. By theorem 2, InvolvedSets returns two or more dependent sets from I<sup>F</sup> . Since all the elements in I<sup>F</sup> are pairwise independent, we confront the contradiction. -

### **4.2 Processing Counter-examples**

As mentioned in Theorem 2, we require all the actions in a minimal counterexample returned by the oracle to be dependent. However, most equivalence checking methods do not find the minimal counter-example. For a non-minimal counter-example, we define a process called "distillation", which asks a number of extra queries to find the dependent actions. It iteratively gets a subset of InvolvedSets(CE, I<sup>F</sup> ) in the order of their sizes and merges its members together, producing a set M. The algorithm introduces PM(CE) as output if it is a counterexample.

Suppose CE is the counter-example returned by the oracle at round k of the algorithm, and I<sup>F</sup> is the alphabet partition at that round. To distill two or more dependent sets from CE, we follow Algorithm 2. The function CutCE on line 2 takes a counter-example CE and returns the smallest prefix of CE, which is also a counter-example (i.e., the SUL and the hypothesis model produce different outputs for it). Then, iteratively, it gets a subset of InvolvedSets(CE, I<sup>F</sup> ) in the order of their sizes and merges its members together, producing set M. The algorithm returns PM(CE) as output if it is a counter-example.

The cost of CE-distillation algorithms is exponential in terms of the size of CE in the worst case. However, in the results section, we show that in practice, the cost of this part is not very significant compared to the total cost of learning.

**Theorem 4.** All actions in the output of the CE distillation algorithm are dependent.

The proof is omitted as it is similar to the proof of Theorem 2.

**Algorithm 2:** CE distillation

```
Result: CEM
1 Input: IF = {I1,...,In}, CE, M, H
2 CE ← CutCE(CE)
3 D ← InvolvedSets(CE, IF )
4 for k ∈ {2, . . . , size(D)} do
5 C ← all k combinations(D)
6 while C is not empty do
 7 I ← C.pop
 8 A ← -

              i∈I Ii
 9 CEA ← PA(CE)
10 if CEA is a counter-example then
11 Return CEA
12 end
13 end
14 end
```
## **5 Empirical Evaluation**

In this section, we present the design and the results of the experiments carried out to evaluate our approach, in order to answer the following research questions:

**RQ1** Does CL<sup>∗</sup> require fewer resets, compared to L∗? **RQ2** Does CL<sup>∗</sup> require fewer input symbols, compared to L∗?

As stated in Section 1, these two research questions measure the efficiency of a learning method in a machine-independent manner: the number of input symbols summarises the total cost of a learning campaign, while the number of resets summarises one of its most costly parts. Note that although active learning processes are structured in terms of queries, the queries used in the processes have vastly different lengths and it has been observed earlier that the total number of input symbols is a more accurate metric for comparison of learning algorithms than the number queries [36].

#### **5.1 Subject Systems**

A meaningful benchmark for our method should feature systems of various state sizes and various numbers of parallel components and with a non-trivial structure that may require multiple learning rounds. Also, we would like to have realistic systems, so that our comparisons have meaningful practical implications.

To this end, we choose the Body Comfort System (BCS) [25], which is an automotive software product line (SPL) of a Volkswagen Golf model. This SPL has 27 components, each representing a feature that provides specific functionality. The transition system of each component is provided in a detailed technical report [24]. We use the finite state machines of the components constructed from

the transition system representations in [35] and compose several random samples utilising the interleaving parallel composition (Definition 2) to build the product FSMs. We automatically constructed 100 FSMs consisting of a minimum of two and a maximum of nine components in this case study. The maximum number is chosen due the performance limits of L∗; beyond this limit, our learning campaign for L<sup>∗</sup> could take more than four hours. All experiments were conducted on a computer with an Intel-<sup>R</sup> CoreTM M-5Y10c CPU and and 8GB of physical memory running Ubuntu version 20 and LearnLib version 0.16.0. Our subject systems have a minimum of 300 states and a maximum of 3840 states, and their average number of states is 1278.2 with a standard deviation of 847. We started the calculation of the metrics for subject systems of at least 300 states, since for small subject systems, the advantage of compositional learning is not significant.

## **5.2 Experiment Design**

To answer the research questions, we implemented the compositional learning algorithm on top of the LearnLib framework [30]. This implementation uses the equivalence oracle in two places; to learn projections in the LearnInParts function and to check the hypothesis/SUL equivalence. The performance of the algorithm significantly relies on the type of equivalence queries used by the underlying L<sup>∗</sup> algorithm. We experimented with a number of equivalence methods and settled upon using random walks; when using deterministic algorithms such as the WP- and the WP-method, for large systems, the cost of equivalence queries becomes prohibitively high and obscures any gain obtained from compositionality. To ensure that our results are sound, we have carried out similar experiments by using an additional deterministic equivalence query at the end of the learning campaign, when the last random equivalence query does not return any counter-example. This additional step verifies our comparisons when an assurance about the accuracy of the learning process is required. More details about these additional experiments can be found in our public lab package [23] (https://github.com/faezeh-lbf/CL-Star).

We enabled caching, since caching significantly reduces repetitive queries. We repeat each learning process three times, comparing the number of resets and input symbols for L<sup>∗</sup> and CL∗.

In addition to reporting the median metrics, their standard deviations, and the relative percentage of improvements, we use the statistical T-test to answer the research questions with statistical confidence and report the p-values. We analyse the distribution of the results and establish their normality using Ktests. We use the SciPy [20] library of Python to perform statistical analysis and Seaborn [38] for visualising the results.

## **5.3 Results**

In this section, we first present the results of our experiments and use them to answer our research questions. Then we show how the number of components in an FSM affects the efficiency of our algorithm. Finally, we discuss threats to the validity of our empirical results.

Fig. 2: The total number of input symbols and resets in the CL<sup>∗</sup> and L<sup>∗</sup> methods

We cluster the benchmark into eight categories based on the FSM's number of states and illustrate the distribution of input symbols and resets for each cluster in Figure 2. In this figure, the CL<sup>∗</sup> and L<sup>∗</sup> methods are compared based on the metrics mentioned. The scale of the x-axis (the value of metrics) is logarithmic.

Tables 1 and 2 summarise the results of our experiments. For each category, we calculate the median and standard deviation of our metrics (the number of input symbols and resets) both for L<sup>∗</sup> and CL∗. The metric "progress percentage" is defined to measure the improvement brought about by compositional learning (compared to L∗). For each metric, the progress percentage is calculated as (1 <sup>−</sup> <sup>p</sup> <sup>q</sup> ) ∗ 100, where p and q are the value of that metric in CL<sup>∗</sup> and L∗, respectively. A positive progress percentage in a metric shows that the CL<sup>∗</sup> is more efficient in terms of that metric. To measure the statistical significance, we used the onesided paired sample T-test to check if there was a significant difference (p < 0.05) between the metrics in the two algorithms.


(2400, 3840] 24700222.5 14837416.08 4385086 13817389.06 68.42 2.66e-12

Table 1: Comparing the total number of input symbols in the CL<sup>∗</sup> and L<sup>∗</sup> meth-


Table 2: Comparing the total number of resets in the CL∗ and L∗ methods

Both Tables 1 and 2 indicate major improvements, particularly for large systems, in terms of the total number of input symbols and resets, respectively. Compositional learning reduces the number of symbols up to 70.80 percent and the number of resets up to 95.83 percent. The statistical tests also confirm this observations and the p-values obtained from the tests are in all cases very low; in case of the number of input symbols the p-values range from 10−<sup>2</sup> to 10−<sup>12</sup>, while for resets they range from 10−<sup>6</sup> to 10−<sup>43</sup>, which are well-below the usual statistical p-values (0.05) and represent a very high statistical significance.

Fig. 3: The diagrams of improvement brought about by compositional learning vs. size of the SUL in terms of number states (left) and components (right).

The plots in Figure 3 visualise the improvements brought about by compositional learning. This plot demonstrates that the saving due to compositional learning increases as the number of components in SULs increases. We further analysed the trends of our measured metrics in terms of the number of states and the number of parallel components. These trends are depicted for the total number of input symbols in Figure 4 and for the number of resets in Figure 5, respectively. These figures indicate that the increase of both metrics with the number of states is more moderate for the compositional learning approach, i.e., compositional learning is more scalable. More importantly, the right-hand-side

Fig. 4: The effect of FSM sizes in terms of the number of components and states on the total number of input symbols.

of both figures signifies the effect of compositional learning when the number of parallel components increases while the number of states remains fixed.

Figure 6 shows the effect of the number of components on the total number of input symbols for a fixed state-space size for algorithms L<sup>∗</sup> and CL∗. In this plot, as the number of components increases, the corresponding dot will become darker and larger. According to this figure, the learning cost is lower for SULs with more components in both L<sup>∗</sup> and CL∗. Still, for CL<sup>∗</sup> (the right side), the cost of learning SULs with more components is significantly lower because we structurally learn these components essentially independently.

As mentioned in Section 4.2, the cost of the CE distillation process can increase exponentially in the size of the counter-example. However, in practice, it seems to be much more tractable. To evaluate this, we count the number of input symbols required by the CE distillation process to learn each SUL. The median value of this metric is 1961 input symbols, which is insignificant compared the total cost of learning. In fact, the cost of CE distillation process for each group in Table 1 is between 0.037 and 0.12 percent of the total learning cost; the reported total learning cost (total number of input symbols) includes the cost of CE distillation.

## **5.4 Threats to Validity**

In this section, we summarise the major threats to the validity of our empirical conclusions. First, we analyse the threats to conclusion validity, i.e., whether the empirical conclusions necessarily follow from the experiments carried out. Then, we discuss the threats to external validity concerning the generalisation of our results to other systems.

We mitigated conclusion validity threats by using statistical tests to ensure that our observations (both in terms of improvement percentages in Tables 1 and 2 and the visual observations in Figures 2) do represent a statistically significant improvement. We opt for one-sided paired sample T-tests in order to minimise

Fig. 5: The effect of the size of FSMs in terms of the number of components and states on the total number of required input resets.

Fig. 6: The relation between the total number of symbols and the number of states and components for the algorithms L<sup>∗</sup> (left) and CL<sup>∗</sup> (right).

the threats to conclusion validity. We only conclude that the CL<sup>∗</sup> is more efficient than the L<sup>∗</sup> when there is a meaningful difference (p < 0.05) between the results of L<sup>∗</sup> and CL∗. To make sure that the chosen statistical test is applicable, we analysed the distribution of the data first.

We mitigated the risk of conclusion validity by using subject systems that are based on practical systems rather than using randomly generated FSMs. However, further research is needed to analyse the performance of our approach based on other benchmarks from other domains. We also mitigated the effect of using random equivalence queries by repeating the experiments with a final deterministic query.

## **6 Conclusions**

In this paper, we presented a compositional learning method based on Angluin's algorithm L<sup>∗</sup> that detects and independently learns interleaving parallel components of the system under learning. We proved that our algorithm, called CL∗, is correct and we empirically showed that it causes significant gains in the number of input symbols and the number of resets in a learning campaign. The gain is significantly increased with the number of parallel components.

Our algorithm is naturally amenable to parallelisation and developing a parallel implementation is a natural next step. A more thorough investigation of counter-example processing in order to efficiently find a minimal counterexample is an area of further research, particularly, in the light of the recent results in this area [13]. Finding a trade-off between using deterministic and random (or mutation-based) equivalence queries is another area of future research. We would also like to investigate the possibility of developing equivalence queries that take the structure of the systems into account: we have observed that much of the effort in the final equivalence query (on the composed system) is redundant and the final equivalence query can be made much more efficient by only considering the dependencies among purportedly independent partitions. Finally, extending our notion of parallel composition to allow for a possible synchronisation of components is another direction of future work; we believe inspirations from concurrency theory and in particular, Milner and Moller's prime decomposition theorem [26] may prove effective in this regard. Independently from our work, Neele and Sammartino [29] proposed an approach to learn synchronous parallel composition, under the assumption of knowing the alphabets of the components. This is a promising approach to incorporate synchronous parallel composition into our framework.

## **Acknowledgments**

We would like to thank Rasta Tadayon and Amin Asadi Sarijalou for their contributions to the early stages of this work. The work of Mohammad Reza Mousavi was supported by the UKRI Trustworthy Autonomous Systems Node in Verifiability, Grant Award Reference EP/V026801/2. We thank the reviewers of FOSSACS for their insightful and constructive comments, which, in our view, led to improvements in our final paper. We thank the Artifact Evaluation committee at ESOP/FOSSACS for their careful review of our lab package.

## **References**

1. Aarts, F., de Ruiter, J., Poll, E.: Formal models of bank cards for free. In: Sixth IEEE International Conference on Software Testing, Verification and Validation, ICST 2013 Workshops Proceedings, Luxembourg, Luxembourg, March 18-22, 2013. pp. 461–468. IEEE Computer Society (2013). https://doi.org/10.1109/ICSTW.2013.60

432 F. Labbaf et al.


Integrated Formal Methods - 16th International Conference, IFM 2020, Lugano, Switzerland, November 16-20, 2020, Proceedings. Lecture Notes in Computer Science, vol. 12546, pp. 22–40. Springer (2020). https://doi.org/10.1007/978-3-030- 63461-2 2


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Pebble minimization: the last theorems

Gaëtan Douéneau-Tabot()

<sup>1</sup> Université Paris Cité, CNRS, IRIF, F-75013, Paris, France <sup>2</sup> Direction générale de l'armement - Ingénierie des projets, Paris, France doueneau@irif.fr

Abstract Pebble transducers are nested two-way transducers which can drop marks (named "pebbles") on their input word. Such machines can compute functions whose output size is polynomial in the size of their input. They can be seen as simple recursive programs whose recursion height is bounded. A natural problem is, given a pebble transducer, to compute an equivalent pebble transducer with minimal recursion height. This problem has been open since the introduction of the model. In this paper, we study two restrictions of pebble transducers, that cannot see the marks ("blind pebble transducers" introduced by Nguyên et al.), or that can only see the last mark dropped ("last pebble transducers" introduced by Engelfriet et al.). For both models, we provide an effective algorithm for minimizing the recursion height. The key property used in both cases is that a function whose output size is linear (resp. quadratic, cubic, etc.) can always be computed by a machine whose recursion height is 1 (resp. 2, 3, etc.). We finally show that this key property fails as soon as we consider machines that can see more than one mark.

Keywords: Pebble transducers · Polyregular functions · Blind pebble transducers · Last pebble transducers · Factorization forests.

## 1 Introduction

Transducers are finite-state machines obtained by adding outputs to finite automata. They are very useful in a lot of areas like coding, computer arithmetic, language processing or program analysis, and more generally in data stream processing. In this paper, we consider deterministic transducers which compute functions from finite words to finite words. In particular, a deterministic twoway transducer is a two-way automaton with outputs. This model describes the class of regular functions, which is often considered as one of the functional counterparts of regular languages. It has been intensively studied for its properties such as closure under composition [5], equivalence with logical transductions [12] or regular expressions [7], decidable equivalence problem [14], etc.

Pebble transducers and polyregular functions. Two-way transducers can only describe functions whose output size is at most linear in the input size. A possible solution to overcome this limitation is to consider nested two-way transducers. In particular, the model of <sup>k</sup>-pebble transducer has been studied for a long time [13]. For k = 1, a 1-pebble transducer is just a two-way transducer. For k - 2, a k-pebble transducer is a two-way transducer that, when on any position i of its input word, can call a (k−1)-pebble transducer. The latter takes as input the original input where position i is marked by a "pebble". The main two-way transducer then outputs the concatenation of all the outputs produced along its calls. The intuitive behavior of a 3-pebble transducer is depicted in fig. 1. It can be seen as a recursive program whose recursion stack has height 3. The class of functions computed by pebble transducers is known as polyregular functions. It has been intensively studied due to its properties such as closure under composition [11], equivalence with logical interpretations [4], etc.

Figure 1: Behavior of a 3-pebble transducer.

Optimization of pebble transducers. Given a <sup>k</sup>-pebble transducer computing a function f, a very natural problem is to compute the least possible 1 - k such that f can be computed by an --pebble transducer. Furthermore, we can be interested in effectively building an --pebble transducer for f. Both questions are open, but they are meaningful since they ask whether we can optimize the recursion height (i.e. the running time) of a program.

It is easy to observe that if f is computed by a k-pebble transducer, then |f(u)| = O(|u| <sup>k</sup>). It was first claimed in a LICS 2020 paper that the minimal recursion height of f (i.e. the least possible such that f can be computed by an --pebble transducer) was exactly the least possible such that |f(u)| = O(|u| -). However, Bojańczyk recently disproved this statement in [3, Theorem 6.3]: the function inner-squaring : u1# ··· #u<sup>n</sup> -<sup>→</sup> (u1#)<sup>n</sup> ···(un#)<sup>n</sup> can be computed by a 3-pebble transducer and is such that |inner-squaring(u)| = O(|u| <sup>2</sup>), but it cannot be computed by a 2-pebble transducer. Other counterexamples were given in [16] using different proof techniques. Therefore, computing the minimal recursion height of f is believed to be hard, since this value not only depends on the output size of f, but also on the word combinatorics of this output.

Optimization of blind pebble transducers. A subclass of pebble transducers, named blind pebble transducers, was recently introduced in [17]. A blind k-pebble transducer is somehow a k-pebble transducer, with the difference that the positions are no longer marked when making recursive calls. The behavior of a blind 3-pebble transducer is depicted in fig. 2. The class of functions computed by blind pebble transducers is strictly included in polyregular functions [10,17]. The main result of [17] shows that for blind pebble transducers, the minimal recursion height for computing a function only depends on the growth of its output. More precisely, if f is computed by a blind k-pebble transducer, then the least possible 1 - - k such that f can be computed by an blind --pebble transducer is the least possible such that <sup>|</sup>f(u)<sup>|</sup> <sup>=</sup> <sup>O</sup>(|u<sup>|</sup> -).

Figure 2: Behavior of a blind 3-pebble transducer.

Contributions. In this paper, we first give a new proof of the connection between minimal recursion height and growth of the output for blind pebble transducers. Furthermore, our proof provides an algorithm that, given a function computed by a blind k-pebble transducer, builds a blind --pebble transducer which computes it, for the least possible 1 - - k. This effective result is not claimed in [17], and our proof techniques significantly differ from theirs. Indeed, we make a heavy use of factorization forests, which have already been used as a powerful tool in the study of pebble transducers [2,8,10].

Secondly, the main contribution of this paper is to show that the (effective) connection between minimal recursion height and growth of the output also holds for the class of last pebble transducers (introduced in [13]). Intuitively, <sup>a</sup> last k-pebble transducer is a k-pebble transducer where a called submachine can only see the position of its call, but not the full stack of the former positions. The behavior of a last 3-pebble transducer is depicted in fig. 3. Observe that a blind k-pebble transducer is a restricted version of a last k-pebble transducer. Formally, we show that if f is computed by a last k-pebble transducer, then the least possible such that f can be computed by a last --pebble transducer is the least possible such that <sup>|</sup>f(u)<sup>|</sup> <sup>=</sup> <sup>O</sup>(|u<sup>|</sup> -). Furthermore, our proof gives an algorithm that effectively builds a last --pebble transducer computing f.

Figure 3: Behavior of a last 3-pebble transducer.

As a third theorem, we show that our result for last pebble transducers is tight, in the sense that the connection between minimal recursion height and growth of the output does not hold for more powerful models. More precisely, we define the model of last-last k-pebble transducers, which extends last k-pebble transducers by allowing them to see the two last positions of the calls (and not only the last one). We show that for all k - 1, there exists a function f such that |f(u)| = O(|u| <sup>2</sup>) and that is computed by a last-last (2k+1)-pebble transducer, but cannot be computed by a last-last 2k-pebble transducer. The proof of this result relies on a counterexample presented by Bojańczyk in [2].

Outline. We introduce two-way transducers in section 2. In section 3 we describe blind pebble transducers and last pebble transducers. We also state our main results that connect the minimal recursion height of a function to the growth of its output. Their proof goes over sections 4 to 6. In section 7, we finally show that these results cannot be extended to two visible marks.

## 2 Preliminaries on two-way transducers

Capital letters A, B denote alphabets, i.e. finite sets of letters. The empty word is denoted by <sup>ε</sup>. If <sup>u</sup> <sup>∈</sup> <sup>A</sup>∗, let <sup>|</sup>u| ∈ <sup>N</sup> be its length, and for <sup>1</sup> <sup>i</sup> <sup>|</sup>u<sup>|</sup> let <sup>u</sup>[i] be its <sup>i</sup>-th letter. If <sup>i</sup> <sup>j</sup>, we let <sup>u</sup>[i:j] be <sup>u</sup>[i]u[i+1] ··· <sup>u</sup>[j] (empty if j<i). If a ∈ A, let |u|<sup>a</sup> be the number of letters a occurring in u. We assume that the reader is familiar with the basics of automata theory, in particular two-way automata and monoid morphisms. The type of total (resp. partial, i.e. possibly undefined on some inputs) functions is denoted S → T (resp. ST).

The machines described in this paper are always deterministic.

Definition 2.1. A two-way transducer T = (A, B, Q, q0, F, δ, λ) consists of: – an input alphabet A and an output alphabet B;


The semantics of a two-way transducer T is defined as follows. When given as input a word u <sup>∈</sup> A<sup>∗</sup>, <sup>T</sup> disposes of a read-only input tape containing u. The marks and are used to detect the borders of the tape, by convention we denote them by positions <sup>0</sup> and <sup>|</sup>u|+1 of u. Formally, a configuration over u is a tuple (q, i) where q <sup>∈</sup> Q is the current state and <sup>0</sup> i - <sup>|</sup>u|+1 is the position of the reading head. The transition relation −→ is defined as follows. Given a configuration (q, i), let (q , -) := δ(q, u[i]). Then (q, i) −→ (q , i ) whenever either - <sup>=</sup> and i <sup>=</sup> i−<sup>1</sup> (move left), or - <sup>=</sup> and i <sup>=</sup> i+1 (move right), with 0 i - <sup>|</sup>u|+1. A run is a sequence of configurations (q<sup>1</sup>, i<sup>1</sup>) → ··· − <sup>−</sup> <sup>→</sup> (q<sup>n</sup>, i<sup>n</sup>). Accepting runs are those that begin in (q<sup>0</sup>, 0) and end in a configuration of the form (q, <sup>|</sup>u|+1) with q <sup>∈</sup> F (and never visit such a configuration before).

The partial function f : A<sup>∗</sup> B<sup>∗</sup> computed by the two-way transducer <sup>T</sup> is defined as follows: for u <sup>∈</sup> A<sup>∗</sup>, if there exists an accepting run on u, then it is unique, and f(u) is defined as λ(q<sup>1</sup>,(u)[i<sup>1</sup>])··· <sup>λ</sup>(q<sup>n</sup>,(u)[i<sup>n</sup>]) <sup>∈</sup> <sup>B</sup><sup>∗</sup>. The class of functions computed by two-way transducers is called regular functions.

*Example 2.2.* Let u be the mirror image of <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup>. Let # ∈ A be a fresh symbol. The function map-reverse : <sup>u</sup><sup>1</sup># ··· #u<sup>n</sup> → <sup>u</sup><sup>1</sup># ··· #u<sup>n</sup> can be computed by a two-way transducer, that reads each factor <sup>u</sup><sup>j</sup> from right to left.

It is well-known that the domain of a regular function is always a regular language (see e.g. [18]). From now on, we assume without losing generalities that our two-way transducers only compute total functions (in other words, they have exactly one accepting run on each u). Furthermore, we assume that λ(q, ) = λ(q, ) = ε for all q <sup>∈</sup> Q (we only lose generality for the image of ε).

In the rest of this section, T denotes a two-way transducer with input alphabet A, output alphabet B and output function λ. Now, we define the crossing sequence in a position 1 i - <sup>|</sup>u<sup>|</sup> of input u. Intuitively, it regroups the states of the accepting run which are visited in this position.

Definition 2.3. *Let* <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> *and* <sup>1</sup> i - <sup>|</sup>u<sup>|</sup> *. Let* (q<sup>1</sup>, i<sup>1</sup>) → ··· − <sup>−</sup> <sup>→</sup> (q<sup>n</sup>, i<sup>n</sup>) *be the accepting run of* <sup>T</sup> *on* u*. The* crossing sequence *of* <sup>T</sup> *in* i*, denoted* cross<sup>u</sup> <sup>T</sup> (i)*, is defined as the sequence* (q<sup>j</sup> )<sup>1</sup>j<sup>n</sup> and <sup>i</sup>j=i*.*

If μ : A<sup>∗</sup> <sup>→</sup> <sup>M</sup> is a monoid morphism, we say that any m, m <sup>∈</sup> <sup>M</sup> and <sup>a</sup> <sup>∈</sup> <sup>A</sup> define a μ-context that we denote by mam . It is well-known that the crossing sequence in a position of the input only depends on the context of this position, for a well-chosen monoid, as claimed in proposition 2.4 (see e.g. [7]).

Proposition 2.4. *One can build a finite monoid* T *and a monoid morphism* <sup>μ</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>T</sup>*, called the* transition morphism *of* <sup>T</sup> *, such that for all* u <sup>∈</sup> A<sup>∗</sup> *and* 1 i - <sup>|</sup>u|*,* cross<sup>u</sup> <sup>T</sup> (i) *only depends on* <sup>μ</sup>(u[1:i−1]), u[i] *and* <sup>μ</sup>(u[i+1:|u|])*. Thus we denote it* cross<sup>T</sup> (μ(u[1:i−1])u[i]μ(u[i+1:|u|])*.*

Finally, let us define "the output produced below position i".

Definition 2.5. *Let* <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> *and* <sup>1</sup> i - <sup>|</sup>u<sup>|</sup> *and* <sup>q</sup><sup>1</sup> ··· <sup>q</sup><sup>n</sup> := cross<sup>u</sup> <sup>T</sup> (i)*. We define the* production *of* <sup>T</sup> *in* i*, denoted* prod<sup>u</sup> <sup>T</sup> (i)*, as* <sup>λ</sup>(q<sup>1</sup>, u[i])··· <sup>λ</sup>(q<sup>n</sup>, u[i])*.*

By proposition 2.4, it also makes sense to define prod<sup>T</sup> (m<sup>a</sup>m- ) ∈ B<sup>∗</sup> to be prod<sup>u</sup> <sup>T</sup> (i) whenever m = μ(u[1:i−1]), m-= μ(u[i+1:|u|]) and a = u[i].

## 3 Blind and last pebble transducers

Now, we are ready to define formally the models of blind pebble transducers and last pebble transducers. Intuitively, they correspond to two-way transducers which make a tree of recursive calls to other two-way transducers.

Definition 3.1 (Blind pebble transducer [17]). For k - 1, a *blind* k*-pebble transducer* with input alphabet A and output alphabet B is:

– if k = 1, a two-way transducer with input alphabet A and output B; – if k - 2, a tree T B1, ··· , Bp where the subtrees B1,..., B<sup>p</sup> are blind (k−1)-pebble transducers with input A and output B; and the root label T is a two-way transducer with input A and output alphabet {B1,..., Bp}.

The (total) function f : A<sup>∗</sup> → B<sup>∗</sup> computed by the blind k-pebble transducer of definition 3.1 is built in a recursive fashion, as follows:


Example 3.2. The function unmarked-square : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>A</sup><sup>∗</sup> {#}, u <sup>→</sup> (u#)|u<sup>|</sup> can be computed by a blind 2-pebble transducer. This machine has shape T T - : T calls T on each position <sup>1</sup> <sup>i</sup> <sup>|</sup>u<sup>|</sup> of its input <sup>u</sup>, and <sup>T</sup> outputs u#.

The class of functions computed by a blind k-pebble transducer for some k - 1 is called polyblind functions [10]. They form a strict subclass of polyregular functions [8,10,17] which is closed under composition [17, Theorem 6.1].

Now, let us define last pebble transducers. They corresponds to blind pebble transducers enhanced with the ability to mark the current position of the input when doing a recursive call. Formally, this position is underlined and we define <sup>u</sup>•<sup>i</sup> := <sup>u</sup>[1] ··· <sup>u</sup>[i−1]u[i]u[i+1] ··· <sup>u</sup>[|u|] for <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> and <sup>1</sup> <sup>i</sup> <sup>|</sup>u|.

Definition 3.3 (Last pebble transducer [13]). For k - 1, a *last* k*-pebble transducer* with input alphabet A and output alphabet B is:


The (total) function f : (AA)<sup>∗</sup> → B<sup>∗</sup> computed by the last k-pebble transducer of definition 3.3 is defined in a recursive fashion, as follows:

– for k = 1, f is the function computed by the two-way transducer;

#### 442 G. Doueneau-Tabot

– for k - <sup>2</sup>, let <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> and (q<sup>1</sup>, i<sup>1</sup>) → ··· − <sup>−</sup> <sup>→</sup> (qn, in) be the accepting run of <sup>T</sup> = (<sup>A</sup> A, B, Q, q<sup>0</sup>, F, δ, λ) on u. For all <sup>1</sup> <sup>j</sup> <sup>n</sup>, let <sup>f</sup>j : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>B</sup><sup>∗</sup> be the concatenation of the functions recursively computed by <sup>λ</sup>(qj ,(u)[ij ]) <sup>∈</sup> {L<sup>1</sup>,..., <sup>L</sup>p}<sup>∗</sup>. Let <sup>τ</sup> : (<sup>A</sup> <sup>A</sup>)<sup>∗</sup> <sup>→</sup> <sup>A</sup><sup>∗</sup> be the morphism which erases the underlining (i.e. <sup>τ</sup> (a) = <sup>a</sup>), then <sup>f</sup>(u) := <sup>f</sup><sup>1</sup>(<sup>τ</sup> (u)•i<sup>1</sup>)··· <sup>f</sup>n(<sup>τ</sup> (u)•in).

The behavior of a last 3-pebble transducer is depicted in fig. 3. Observe that our definition builds a function of type (A A)<sup>∗</sup> <sup>→</sup> <sup>B</sup><sup>∗</sup>, but we shall in fact consider its restriction to A<sup>∗</sup> (the marks are only used within the induction step).

*Example 3.4 ([1]).* The function square : u → (u•1)# ···(u•|u|)# can be computed by a last 2-pebble transducer, which successively marks and makes recursive calls in positions <sup>1</sup>, <sup>2</sup>, etc. However this function is not polyblind [17].

We are ready to state our main result. Its proof goes over sections 4 to 6.

Theorem 3.5 (Minimization of the recursion height). *Let* <sup>1</sup> k*. Let* f : A<sup>∗</sup> <sup>→</sup> <sup>B</sup><sup>∗</sup> *be computed by a blind* <sup>k</sup>*-pebble transducer (resp. by a last* <sup>k</sup>*-pebble transducer). Then* f *can be computed by a blind -pebble transducer (resp. by a last -pebble transducer) if and only if* <sup>|</sup>f(u)<sup>|</sup> <sup>=</sup> <sup>O</sup>(|u<sup>|</sup> -)*. This property is decidable and the construction is effective.*

As an easy consequence, the class of functions computed by last pebble transducers form a strict subclass of the polyregular functions (because theorem 3.5 does not hold for the full model of pebble transducers [3, Theorem 6.3]) and therefore it is not closed under composition (because any polyregular function can be obtained as a composition of regular functions and squares [1]).

Even if a (non-effective) theorem 3.5 was already known for blind pebble transducers [17, Theorem 7.1], we shall first present our proof of this case. Indeed, it is a new proof (relying on factorization forests) which is simpler than the original one. Furthermore, understanding the techniques used is a key step for understanding the proof for last pebble transducers presented afterwards.

## 4 Factorization forests

In this section, we introduce the key tool of factorization forests. Given a monoid morphism μ : A<sup>∗</sup> <sup>→</sup> <sup>M</sup> and <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup>, a <sup>μ</sup>-factorization forest of <sup>u</sup> is an unranked tree structure defined as follows. We use the brackets · · · to build a tree.

Definition 4.1 (Factorization forest [19]). *Given a morphism* μ : A<sup>∗</sup> <sup>→</sup> <sup>M</sup> *and* u <sup>∈</sup> A<sup>∗</sup>*, we say that* <sup>F</sup> *is a* <sup>μ</sup>-forest *of* <sup>u</sup> *if:*


We use the standard tree vocabulary of height, child, sibling, descendant and ancestor (a node being itself one of its ancestors/descendants), etc. We denote by Nodes<sup>F</sup> the set of nodes of F. In order to simplify the statements, we identify a node t ∈ Nodes<sup>F</sup> with the subtree rooted in this node. Thus Nodes<sup>F</sup> can also be seen as the set of subtrees of F, and F ∈ Nodes<sup>F</sup> . We say that a node is idempotent if it has at least 3 children. We denote by Forestsμ(u) (resp. Forests<sup>d</sup> <sup>μ</sup>(u)) the set of μ-forests of u ∈ A<sup>∗</sup> (resp. μ-forests of u ∈ A<sup>∗</sup> of height at most d). We write Forests<sup>μ</sup> and Forests<sup>d</sup> <sup>μ</sup> of all forests (of any word).

A μ-forest of u ∈ A<sup>∗</sup> can also be seen as "the word u with brackets" in definition 4.1. Therefore Forests<sup>μ</sup> can be seen as a language over <sup>A</sup>- := A {,}. In this setting, it is well-known that μ-forests of bounded height can effectively be computed by a rational function, i.e. a particular case of regular function that can be computed by a non-deterministic one-way transducer (see e.g. [8]).

Theorem 4.2 (Simon [19,6]). *Given a morphism* <sup>μ</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>M</sup> *into a finite monoid* <sup>M</sup>*, one can effectively build a rational function* forest<sup>μ</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> (A-)<sup>∗</sup> *such that for all* <sup>u</sup> <sup>∈</sup> <sup>A</sup>∗*,* forestμ(u) <sup>∈</sup> Forests<sup>3</sup>|M<sup>|</sup> <sup>μ</sup> (u)*.*

Building μ-forests of bounded height is especially useful for us, since it enables to decompose any word in a somehow bounded way. This decomposition will be guided by the following definitions, that have been introduced in [8,10]. First, we define iterable nodes as the middle children of idempotent nodes.

Definition 4.3. *Let* F ∈ Forestsμ(u)*. Its iterable nodes, denoted* Iter<sup>F</sup> *, are:* – *if* <sup>F</sup> <sup>=</sup> a ∈ <sup>A</sup> *or* <sup>F</sup> <sup>=</sup> <sup>ε</sup>*, then* Iter<sup>F</sup> := <sup>∅</sup>*;* – *otherwise if* <sup>F</sup> <sup>=</sup> F1, ··· , <sup>F</sup>n*, then:*

$$\text{lter}^{\mathcal{F}} := \{ \mathcal{F}\_i : 2 \leqslant i \leqslant n - 1 \} \cup \bigcup\_{1 \leqslant i \leqslant n} \text{lter}^{\mathcal{F}\_i}.$$

Now, we define the notion of skeleton of a node t, which contains all the descendants of t except those which are iterable.

Definition 4.4 (Skeleton, frontier). *Let* F ∈ Forestsμ(u)*,* <sup>t</sup> <sup>∈</sup> Nodes<sup>F</sup> *, we define the skeleton of* t*, denoted* Skel<sup>F</sup> (t)*, by:*

– *if* <sup>t</sup> <sup>=</sup> a ∈ <sup>A</sup> *is a leaf, then* Skel<sup>F</sup> (t) := {t}*;*

– *otherwise if* <sup>t</sup> <sup>=</sup> F1, ··· , <sup>F</sup>n*, then* Skel<sup>F</sup> (t) := {t} ∪ Skel<sup>F</sup> (F1) <sup>∪</sup> Skel<sup>F</sup> (Fn)*. The frontier of* <sup>t</sup> *is the set* Fr<sup>F</sup> (t) <sup>⊆</sup> [1:|u|] *containing the positions of* <sup>u</sup> *which belong to* Skel<sup>F</sup> (t) *(when seen as leaves of the* <sup>μ</sup>*-forest* <sup>F</sup> *over* <sup>u</sup>*).*

*Example 4.5.* Let <sup>M</sup> := ({−1, <sup>1</sup>, <sup>0</sup>}, <sup>×</sup>) and <sup>μ</sup> : <sup>M</sup><sup>∗</sup> <sup>→</sup> <sup>M</sup> the product. A <sup>μ</sup>-forest F of the word (−1)(−1)0(−1)000000 is depicted in Figure 4. Double lines denote idempotent nodes. The set of blue nodes is the skeleton of the topmost blue node.

It is easy to observe that for F ∈ Forests<sup>d</sup> <sup>μ</sup>(u), the size of a skeleton, or of a frontier, is bounded independently from F. Furthermore, the set of skeletons {Skel<sup>F</sup> (t) : t ∈ Iter<sup>F</sup> ∪ {F}} is a partition of Nodes<sup>F</sup> [8, Lemma 33]. As a consequence, the set of frontiers {Fr<sup>F</sup> (t) : t ∈ Iter<sup>F</sup> ∪ {F}} is a partition of [1:|u|]. Given a position <sup>1</sup> i - |u|, we can thus define the origin of i in F, denoted origin<sup>F</sup> (i), as the unique t ∈ Iter<sup>F</sup> ∪ {F} such that i ∈ Fr<sup>F</sup> (t).

Figure 4: F ∈ Forestsμ((−1)(−1)0(−1)000000) and a skeleton.

Definition 4.6 (Observation). Let F ∈ Forests<sup>μ</sup> and <sup>t</sup>, <sup>t</sup> - ∈ Nodes<sup>F</sup> . We say that t ∈ Nodes<sup>F</sup> observes t - ∈ Nodes<sup>F</sup> if either t is an ancestor of t, or t is the immediate right or left sibling of an ancestor of t.

Figure 5: Nodes that observe • and that • observes

The intuition behind the notion of observation (which is not symmetrical) is depicted in fig. 5. Note that in a forest of bounded height, the number of nodes that some t observes is bounded. This will be a key argument in the following. We say that t and t are dependent if either t observes t or the converse. Given <sup>F</sup>, we can translate these notions to the positions of <sup>u</sup>: we say that <sup>i</sup> observes (resp. depends on) i if origin<sup>F</sup> (i) observes (resp. depends on) origin<sup>F</sup> (i - ).

## 5 Height minimization of blind pebble transducers

In this section, we show theorem 3.5 for blind pebble transducers. We say that a two-way transducer T is a submachine of a blind pebble transducer B if T labels a node in the tree description of <sup>B</sup>. If <sup>B</sup> <sup>=</sup> <sup>T</sup> B1,..., <sup>B</sup><sup>n</sup>, we say that the submachine T is the head of B. We let the transition morphism of B be the cartesian product of all the transition morphisms of all the submachines of B. Observe that it makes sense to consider the production of a submachine T in a context defined using the transition morphism of B.

### 5.1 Pumpability

We first give a sufficient condition, named pumpability, for a blind k-pebble transducer to compute a function f such that |f(u)| -= O(|u| <sup>k</sup>−<sup>1</sup>). The behavior of a pumpable blind 2-pebble transducer is depicted in fig. 6 over a well-chosen input: it has a factor in which the head T<sup>1</sup> calls a submachine T2, and a factor in which T<sup>2</sup> produces a non-empty output. Furthermore both factors can be iterated without destroying the runs of these machines (due to idempotents).

Definition 5.1. Let B be a blind k-pebble transducer whose transition morphism is <sup>μ</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>T</sup>. We say that the transducer <sup>B</sup> is *pumpable* if there exists:


– <sup>a</sup>1,...,a<sup>k</sup> <sup>∈</sup> <sup>A</sup> such that for all <sup>1</sup> j k, e<sup>j</sup> := <sup>j</sup>μ(a<sup>j</sup> )r<sup>j</sup> is an idempotent; – a permutation <sup>σ</sup> : [1:k] <sup>→</sup> [1:k];

such that if <sup>M</sup><sup>j</sup> <sup>i</sup> := <sup>m</sup>iei+1mi+1 ··· <sup>e</sup>jm<sup>j</sup> for all <sup>0</sup> i j k, and if we define the following context for all 1 j k:

$$\mathcal{L}\_j := \mathcal{M}\_0^{\sigma(j)-1} e\_{\sigma(j)} \ell\_{\sigma(j)} \|a\_{\sigma(j)}\| r\_{\sigma(j)} e\_{\sigma(j)} \mathcal{M}\_{\sigma(j)}^k$$

then for all 1 j k−1, |prod<sup>T</sup><sup>j</sup> (C<sup>j</sup> )|<sup>T</sup>j+1 -= 0, and prod<sup>T</sup><sup>k</sup> (Ck) -= ε.

Figure 6: Pumpability in a blind 2-pebble transducer.

Lemma 5.2 follows by choosing inverse images in A<sup>∗</sup> for the mi, <sup>i</sup> and ri.

Lemma 5.2. Let f be computed by a pumpable blind k-pebble transducer. There exists words <sup>v</sup>0,...,vk, u1,...,u<sup>k</sup> such that <sup>|</sup>f(v0u<sup>X</sup> <sup>1</sup> ··· <sup>u</sup><sup>X</sup> <sup>k</sup> <sup>v</sup>k)<sup>|</sup> <sup>=</sup> <sup>Θ</sup>(X<sup>k</sup>).

Now, we use pumpability as a key ingredient for showing theorem 3.5, which directly follows by induction from the more precise theorem 5.3.

Theorem 5.3 (Removing one layer). Let <sup>k</sup> <sup>2</sup> and <sup>f</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>B</sup><sup>∗</sup> be computed by a blind k-pebble transducer B. The following are equivalent: 1. |f(u)| = O(|u| <sup>k</sup>−<sup>1</sup>);

2. B is not pumpable;

3. f can be computed by a blind (k−1)-pebble transducer.

Furthermore, this property is decidable and the construction is effective.

Proof. Item 3 ⇒ item 1 is obvious. Item 1 ⇒ item 2 is lemma 5.2. Furthermore, pumpability can be tested by an enumeration of μ(A∗) and A. It remains to show item 2 ⇒ item 3 (in an effective fashion): this is the purpose of section 5.2.

### 5.2 Algorithm for removing a recursion layer

Let k - 2 and U be a blind k-pebble transducer that is not pumpable, and that computes f : A<sup>∗</sup> → B∗. We build a blind (k−1)-pebble transducer U for f.

Let <sup>μ</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>T</sup> be the transition morphism of <sup>U</sup> . We shall consider that, on input <sup>u</sup> <sup>∈</sup> <sup>A</sup>∗, the submachines of <sup>U</sup> can in fact use forestμ(u) <sup>⊆</sup> (A-)<sup>∗</sup> as input. Indeed forest<sup>μ</sup> is a rational function (by theorem 4.2), hence its information can be recovered by using a lookaround. Informally, the lookaround feature enables a two-way transducer to chose its transitions not only depending on its current state and current letter <sup>u</sup>[i] in position <sup>1</sup> <sup>i</sup> <sup>|</sup>u|, but also on a regular property of the prefix u[1:i−1] and the suffix u[i+1:|u|]. It is well-known that given a twoway transducer T with lookarounds, one can build an equivalent T that does not have this feature (see e.g. [15,12]). Furthermore, even if the accepting runs of T and T may differ, they produce the same outputs from the same positions (this observation will be critical for last pebble transducers, in order to ensure that the marked positions of the recursive calls will be preserved).

Now, we describe the two-way transducers that are the submachines of U . First, it has submachines old-T for T a submachine of U , which are described in algorithm 1. Intuitively, old-T is just a copy of T . It is clear that if T is a submachine of U , then old-T (u) is the concatenation of the outputs produced by (the recursive calls of) T along its accepting run on u.



U also has submachines accelerate-T for T a submachine of U , which are described in algorithm 2. Intuitively, accelerate-T simulates T while trying to inline recursive calls in its own run. More precisely, let u ∈ A<sup>∗</sup> be the input and <sup>F</sup> := forestμ(u). If <sup>T</sup> calls <sup>B</sup> in <sup>1</sup> i - |u| that belongs to the frontier of the root node F of F, then accelerate-T inlines the behavior of the head of B . Otherwise it makes a recursive call, except if B is a leaf of U . Hence if T is a submachine of U which is not a leaf, accelerate-T (u) is the concatenation of the outputs produced by the calls of T along its accepting run.


Finally, the transducer U is obtained by defining accelerate-T to be its head, where T is the head of U . Furthermore, we remove the submachines old-T or accelerate-T which are never called. Observe that U indeed computes the function f. Furthermore, we observe that U has recursion height (i.e. the number of nested Call instructions, plus 1 for the head) k−1, since each inlining of lines 9, 10 and 12 in algorithm 2 removes exactly one recursion layer of U .

It remains to justify that each accelerate-T can be implemented by a twoway transducer (i.e. with lookarounds but a bounded memory). We represent variable i by the current position of the transducer. Since it has access to F, the lookaround can be used to check whether i ∈ Fr<sup>F</sup> (F) or not (since the size of Fr<sup>F</sup> (F) is bounded). It remains to explain how the inlinings are performed:

– if i ∈ Fr<sup>F</sup> (F), the two-way transducer inlines old-T by executing the same moves and calls as T does. Once its computation is ended, it has to go back to position i. This is indeed possible since belonging to Fr<sup>F</sup> (F) is a property that can be detected by using the lookaround, hence the machine only needs to remember that i was the --th position of Fr<sup>F</sup> (F) (being bounded);

– else if B- = T is a blind 1-pebble transducer, we produce the output of T - without moving. This is possible since for all i - -<sup>∈</sup> Fr<sup>F</sup> (F), prod<sup>u</sup> T - (i - ) = ε (hence the output of T on u is bounded, and its value can be determined without moving, just by using the lookaround). Indeed, if prod<sup>u</sup> T - (i - ) -= ε for such an i - -∈ Fr<sup>F</sup> (F) when reaching line 12 of algorithm 2, then the conditions of lemma 5.4 hold, which yields a contradiction. This lemma is the key argument of this proof, relying on the non-pumpability of U .

Lemma 5.4 (Key lemma). Let u ∈ A<sup>∗</sup> and F ∈ Forestsμ(u). Assume that there exists a sequence T1,..., T<sup>k</sup> of submachines of U and a sequence of positions 1 i1,...,i<sup>k</sup> -|u| such that:

– T<sup>1</sup> is the head of U ; – for all 1 j <sup>k</sup>−1, <sup>|</sup>prod<sup>u</sup> <sup>T</sup><sup>j</sup> (i<sup>j</sup> )|<sup>T</sup>j+1 -= 0 and prod<sup>u</sup> <sup>T</sup><sup>k</sup> (ik) -= ε; – for all 1 j k, i<sup>j</sup> -∈ Fr<sup>F</sup> (F) (i.e. origin<sup>F</sup> (i<sup>j</sup> ) ∈ Iter<sup>F</sup> ). Then B is pumpable.

Proof (idea). We first observe that pumpability follows as soon as the nodes origin<sup>F</sup> (i<sup>j</sup> ) are pairwise independent. We then show that this independence condition can always be obtained, up to duplicating some iterable subtrees of F (and some factors of u), because the behavior of a submachine in a blind pebble transducer does not depend on the positions of the above recursive calls.

## 6 Height minimization of last pebble transducers

In this section, we show theorem 3.5 for last pebble transducers. The notions of submachine, head and transition morphism for a last pebble transducer are defined as in section 5. The transition morphism is now defined over (A A)∗.

## 6.1 Pumpability

The sketch of the proof is similar to section 5. We first give an equivalent of pumpability for last pebble transducers. The intuition behind this notion is depicted in fig. 7. The formal definition is however more cumbersome, since we need to keep track of the fact that the calling position is marked.

Definition 6.1. Let L be a last k-pebble transducer whose transition morphism is <sup>μ</sup> : (<sup>A</sup> <sup>∪</sup> <sup>A</sup>)<sup>∗</sup> <sup>→</sup> <sup>T</sup>. We say that the transducer <sup>L</sup> is *pumpable* if there exists:


*such that if we let* <sup>M</sup><sup>j</sup> <sup>i</sup> := <sup>m</sup>iei+1mi+1 ··· <sup>e</sup>jm<sup>j</sup> *for all* <sup>0</sup> i j k*, and if we define the following context:*

$$\mathcal{C}\_1 := \mathcal{M}\_0^{\sigma(1)-1} e\_{\sigma(1)} \ell\_{\sigma(1)} [a\_{\sigma(1)}] r\_{\sigma(1)} e\_{\sigma(1)} \mathcal{M}\_{\sigma(1)}^k$$

*and for all* 1 j k−1 *the context:*

$$\begin{split} \mathcal{L}\_{j+1} &:= \mathcal{M}\_0^{\sigma(j)-1} e\_{\sigma(j)} \ell\_{\sigma(j)} \mu(\underline{a\_{\sigma(j)}}) r\_{\sigma(j)} e\_{\sigma(j)} \mathcal{M}\_{\sigma(j)}^{\sigma(j+1)-1} \\ &e\_{\sigma(j+1)} \ell\_{\sigma(j+1)} \| a\_{\sigma(j+1)} \| r\_{\sigma(j+1)} e\_{\sigma(j+1)} \mathcal{M}\_{\sigma(j+1)}^k &\quad \text{if } \sigma(j) < \sigma(j+1); \\ \mathcal{L}\_{j+1} &:= \mathcal{M}\_0^{\sigma(j)-1} e\_{\sigma(j+1)} \ell\_{\sigma(j+1)} \| a\_{\sigma(j+1)} \| r\_{\sigma(j+1)} e\_{\sigma(j+1)} \\ &\mathcal{M}\_{\sigma(j+1)}^{\sigma(j)-1} e\_{\sigma(j)} \ell\_{\sigma(j)} \mu(\underline{a\_{\sigma(j)}}) r\_{\sigma(j)} e\_{\sigma(j)} \mathcal{M}\_{\sigma(j)}^k &\quad \text{otherwise}; \end{split}$$

*then for all* 1 j k−1*,* |prod<sup>T</sup><sup>j</sup> (C<sup>j</sup> )|<sup>T</sup>j+1 -= 0*, and* prod<sup>T</sup><sup>k</sup> (Ck) -= ε*.*

Figure 7: Pumpability in a last 2-pebble transducer.

We obtain lemma 6.2 by a proof which is similar to that of lemma 5.2.

Lemma 6.2. *Let* f *be computed by a pumpable last* k*-pebble transducer. There exists words* <sup>v</sup>0,...,vk, u1,...,u<sup>k</sup> *such that* <sup>|</sup>f(v0u<sup>X</sup> <sup>1</sup> ··· <sup>u</sup><sup>X</sup> <sup>k</sup> <sup>v</sup>k)<sup>|</sup> <sup>=</sup> <sup>Θ</sup>(X<sup>k</sup>)*.*

Theorem 6.3 (Removing one layer). *Let* <sup>k</sup> <sup>2</sup> *and* <sup>f</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>B</sup><sup>∗</sup> *be computed by a last* k*-pebble transducer* L *. The following are equivalent: 1.* |f(u)| = O(|u| <sup>k</sup>−<sup>1</sup>)*; 2.* L *is not pumpable; 3.* f *can be computed by a last* (k−1)*-pebble transducer. Furthermore, this property is decidable and the construction is effective.*

*Proof.* Item 3 ⇒ item 1 is obvious. Item 1 ⇒ item 2 is lemma 6.2. Furthermore, pumpability can be tested by an enumeration of μ(A∗) and A. It remains to show item 2 ⇒ item 3 (in an effective fashion): this is the purpose of section 6.2.

## 6.2 Algorithm for removing a recursion layer

Let k - 2 and U be a last k-pebble transducer that is not pumpable, and that computes <sup>f</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>B</sup>∗. We build a last (k−1)-pebble transducer <sup>U</sup> for <sup>f</sup>. Let <sup>μ</sup> : (<sup>A</sup> <sup>A</sup>)<sup>∗</sup> <sup>→</sup> <sup>T</sup> be the transition morphism of <sup>U</sup> . As before (using a lookaround), the submachines of <sup>U</sup> have access to forestμ(u) on input <sup>u</sup> <sup>∈</sup> <sup>A</sup>∗.

Now, we describe the submachines of U . It has submachines old-T -along-ρ for T a submachine of U and ρ a run of T , which are described in algorithm 1. Intuitively, these machines mimics the behavior of T along the run ρ (which is not necessarily accepting) of <sup>T</sup> over v with <sup>v</sup> <sup>∈</sup> (<sup>A</sup> <sup>A</sup>)∗.

Since they are indexed by a run ρ, it may seem that we create an infinite number of submachines, but it will not be the case. Indeed, a run ρ will be represented by its first configuration (q1, i1) and last configuration (qn, in). This information is sufficient to simulate exactly the two-way moves of ρ, but there is still an unbounded information: the positions i<sup>1</sup> and in. In fact, the input will be of the form <sup>v</sup> <sup>=</sup> <sup>u</sup>•<sup>i</sup> and we shall guarantee that the <sup>i</sup><sup>1</sup> and <sup>i</sup><sup>n</sup> can be detected by the lookaround if i is marked. Hence the run ρ will be represented in a bounded way, independently from the input v, and so that its first and last configurations can be detected by the lookaround of the submachine.

It follows from algorithm 3 that if T is a submachine of U , then for all <sup>v</sup> <sup>∈</sup> (<sup>A</sup> <sup>∪</sup> <sup>A</sup>)<sup>∗</sup> and <sup>ρ</sup> run of <sup>T</sup> on v, old-<sup>T</sup> -along-<sup>ρ</sup> (v) is the concatenation of the outputs produced by (the recursive calls of) T along ρ.

We also define a submachine normal-T -along-ρ-pebble-**i** that is similar to old-T -along-ρ, except that it ignores the mark of its input and acts as if it was in position i (as above for ρ, i will be encoded by a bounded information).


U also has submachines accelerate-T -along-ρ for T a submachine of U , which are described in algorithm 4. Intuitively, accelerate-T -along-ρ simulates T along ρ while trying to inline some recursive calls. Whenever it is in position i and needs to call recursively L whose head is T - , it first slices the accepting run ρ of T on <sup>u</sup>•i, with respect to forestμ(u) and <sup>i</sup>, as explained in definition 6.4 and depicted in fig. 8. Intuitively, this operation splits ρ into a bounded number of runs whose positions either all observe i, or i observes all of them, or none of these cases occur (the positions are either <sup>0</sup>, <sup>|</sup>u|+1 or independent of <sup>i</sup>).

Definition 6.4 (Slicing). Let <sup>u</sup> <sup>∈</sup> <sup>A</sup>∗, F ∈ Forestsμ(u) and <sup>1</sup> i - <sup>|</sup>u|. We let <sup>↑</sup> **<sup>i</sup>** (resp. <sup>↓</sup> **<sup>i</sup>**) be the set of positions that <sup>i</sup> observes (resp. that observe <sup>i</sup>). Let <sup>ρ</sup> = (q1, i1) →···− <sup>−</sup> <sup>→</sup>(qn, in) be a run of a two-way transducer <sup>T</sup> on <sup>u</sup>•i. We build by induction a sequence 1,...,N+1 with <sup>1</sup> := 1 and:


Finally the *slicing* of <sup>ρ</sup> ,with respect to <sup>F</sup> and <sup>i</sup>, is the sequence of runs <sup>ρ</sup>1,...,ρ<sup>N</sup> where ρ<sup>j</sup> := (q<sup>j</sup> , i<sup>j</sup> ) −→ (qj+1, i<sup>j</sup>+1) →···− <sup>−</sup> <sup>→</sup>(qj+1−<sup>1</sup>, i<sup>j</sup>+1−<sup>1</sup>).

Figure 8: Slicing of a run <sup>ρ</sup> with respect to <sup>i</sup> and <sup>F</sup>.

Now, let ρ- 1,...,ρ- <sup>N</sup> be slicing of the run <sup>ρ</sup> of T on the input <sup>u</sup>•i. For all 1 j - N, there are mainly two cases. Either the positions of ρ- <sup>j</sup> all are in <sup>↑</sup> <sup>i</sup> or <sup>↓</sup> <sup>i</sup>. In this case, accelerate-<sup>T</sup> -along-<sup>ρ</sup> directly inlines old-<sup>T</sup> - -along-ρ- <sup>j</sup> within its own run (i.e. without making a recursive call). Otherwise, it makes a recursive call to accelerate-T - -along-ρ- <sup>j</sup> , except if L is a leaf of U (thus L - = T - ).

Finally, <sup>U</sup> is described as follows: on input <sup>u</sup> <sup>∈</sup> <sup>A</sup>∗, its head is the submachine accelerate-T -along-ρ (u), where T is the head of U and ρ is the accepting run of T on <sup>u</sup> (represented by the bounded information that it is both initial and final). As before, we remove the submachines which are never called in U . Observe that we have created a machine with recursion height <sup>k</sup>−<sup>1</sup> (because line 17 in algorithm 4 prevents from calling a k-th layer).

Let us justify that each accelerate-T -along-ρ can indeed be implemented by a two-way transducer. First, let us observe that since F has bounded height, the number N of slices given in line 7 of algorithm 4 is bounded. Furthermore, we claim that the first and last positions of each ρ- <sup>j</sup> belong to a given set of bounded size, which can be detected by a lookaround which has access to i. For the ρ- j

24 end 25 end


whose positions are in ↑ i, this is clear since |↑ i| is bounded (because the frontier of any node is bounded). For <sup>↓</sup> <sup>i</sup> - ↑ i we use lemma 6.5, which implies that this set is a bounded union of intervals. The last case is very similar.

Lemma 6.5. Let <sup>1</sup> i - <sup>|</sup>u|, <sup>t</sup> := origin<sup>F</sup> (i) and <sup>t</sup><sup>1</sup> (resp. <sup>t</sup>2) be its immediate left (resp. right) sibling (they exist whenever <sup>t</sup> <sup>∈</sup> Iter<sup>F</sup> , i.e. here <sup>t</sup> <sup>=</sup> <sup>F</sup>). Then:

$$i \downarrow i \\ \times \uparrow i = [\min(\mathsf{Fr}^{\mathcal{F}}(\mathfrak{t}\_1)) : \max(\mathsf{Fr}^{\mathcal{F}}(\mathfrak{t}\_2))] \\ \sim \{\mathsf{Fr}^{\mathcal{F}}(\mathfrak{t}\_1), \mathsf{Fr}^{\mathcal{F}}(\mathfrak{t}), \mathsf{Fr}^{\mathcal{F}}(\mathfrak{t}\_2)\}.$$

This analysis justifies why each ρ- <sup>j</sup> can be encoded in a bounded way. Now, we show how to implement the inlinings while using i as the current position:

– if <sup>i</sup>1,...,i<sup>n</sup> <sup>∈</sup> <sup>↑</sup> <sup>i</sup>, then <sup>n</sup> is bounded (because |↑ <sup>i</sup><sup>|</sup> is bounded). We can thus inline old-T - -along-ρ- <sup>j</sup> (u•i) while staying in position i. However, when T calls some L -- (of head T --) on position i-, we would need to call old-T ---along-ρ--(u•i-) (where ρ- is the accepting run of T - along u•i-). But we cannot do this operation, since we are in position i and not in i-. The solution is that the inlined code calls normal-T ---along-ρ---pebble-i-(u•i) instead, which simulates an accepting run ρ- of T on u•i-, even if its input is u•i. Note that i can be represented as a bounded information and recovered by a lookaround given u•i as input, since i observes i-;

– if <sup>i</sup>1,...,i<sup>n</sup> <sup>∈</sup> <sup>↓</sup> <sup>i</sup> - ↑ i, then the nodes origin<sup>F</sup> (i1),..., origin<sup>F</sup> (in) are roughly below origin<sup>F</sup> (i) in F (see fig. 5). We inline old-T - -along-ρ- <sup>j</sup> (u•i), by moving along i1,...,i<sup>n</sup> as ρ- <sup>j</sup> does. We can keep track of the height of origin<sup>F</sup> (i) above the current origin<sup>F</sup> (i-) (it is a bounded information). With the lookaround, we can detect the end of ρ- <sup>j</sup> , and go back to position i.

It remains to justify that U is correct. For this, we only need to show that when it reaches line 18 in algorithm 4, the output of T along ρ- <sup>j</sup> is indeed empty. Otherwise, the conditions of lemma 6.6 would hold (since we never execute two successive recursive calls in dependent positions). It provides a contradiction.

Lemma 6.6 (Key lemma). Let u ∈ A<sup>∗</sup> and F ∈ Forestsμ(u). Assume that there exists a sequence T1,..., T<sup>k</sup> of submachines of U and a sequence of positions 1 i1,...,i<sup>k</sup> -|u| such that:

– T<sup>1</sup> is the head of U ; – <sup>|</sup>prod<sup>u</sup> <sup>T</sup><sup>1</sup> (i1)|<sup>T</sup><sup>2</sup> = 0 and prod<sup>u</sup>•ik−<sup>1</sup> <sup>T</sup><sup>k</sup> (ik) = ε; – for all 2 j <sup>k</sup>−1, <sup>|</sup>prod<sup>u</sup>•ij−<sup>1</sup> <sup>T</sup><sup>j</sup> (i<sup>j</sup> )|<sup>T</sup>j+1 = 0; – for all 1 j k−1, origin<sup>F</sup> (i<sup>j</sup> ) and origin<sup>F</sup> (ij+1) are independent;

Then U is pumpable.

Proof (idea). As for lemma 5.4, the key observation is that pumpability follows as soon as the nodes origin<sup>F</sup> (i<sup>j</sup> ) are pairwise independent. Furthermore, this condition can be obtained by duplicating some nodes in F.

## 7 Making the two last pebbles visible

We can define a similar model to that of last k-pebble transducer, which sees the two last calling positions instead of only the previous one. Let us name this model a last-last k-pebble transducer. A very natural question is to know whether we can show an analog of theorem 3.5 for these machines.

Note that for k = 1, 2 and 3, a last-last k-pebble transducer is exactly the same as a k-pebble transducer. Hence the function inner-squaring of page 2 is such that |inner-squaring(u)| = O(|u| <sup>2</sup>) and can be computed by a last-last 3 pebble transducer, but it cannot be computed by a last-last 2-pebble transducer. It follows that the connection between minimal recursion height and growth of the output fails. However, this result is somehow artificial. Indeed, a last-last 2-pebble transducer is a degenerate case, since it can only see one last pebble. More interestingly, we show that the connection fails for arbitrary heights.

Theorem 7.1. For all <sup>k</sup> <sup>2</sup>, there exists a function <sup>f</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>B</sup><sup>∗</sup> such that |f(u)| = O(|u| <sup>2</sup>) and that can be computed by a last-last (2k+1)-pebble transducer, but not by a last-last 2k-pebble transducer.

Proof (idea). We re-use a counterexample introduced by Bojańczyk in [2] to show a similar failure result for the model of k-pebble transducers.

## 8 Outlook

This paper somehow settles the discussion concerning the variants of pebble transducers for which the minimal recursion height only depends on the growth of the output. As soon as two marks are visible, the combinatorics of the output also has to be taken into account, hence minimizing the recursion height in this

case (e.g. for last-last pebble transducers) seems hard with the current tools. As observed in [13], one can extend last pebble transducers by allowing the recursion height to be unbounded (in the spirit of marble transducers [9]). This model enables to produce outputs whose size grows exponentially in the size of the input. A natural question is to know whether a function computed by this model, but whose output size is polynomial, can in fact be computed with a recursion stack of bounded height (i.e. by a last <sup>k</sup>-pebble transducer).

Acknowledgements. The author is grateful to Tito Nguyên for suggesting the study of the recursion height for last pebble transducers.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Fixed Points and Noetherian Topologies**

Aliaume Lopez<sup>1</sup>,2()

<sup>1</sup> Universit´e Paris Cit´e, CNRS, IRIF, F-75013, Paris, France alopez@irif.fr <sup>2</sup> Universit´e Paris-Saclay, CNRS, ENS Paris-Saclay, Laboratoire M´ethodes Formelles, 91190, Gif-sur-Yvette, France.

**Abstract.** Noetherian spaces are a generalisation of well-quasi-orderings to topologies, that can be used to prove termination of programs. They find applications in the verification of transition systems, some of which are better described using topology. The goal of this paper is to allow the systematic description of computations using inductively defined datatypes via Noetherian spaces. This is achieved through a fixed point theorem based on a topological minimal bad sequence argument.

**Keywords:** Noetherian spaces · topology · well-quasi-orderings · initial algebras · Kruskal's Theorem · Higman's Lemma.

## **1 Introduction**

Let (E, <sup>≤</sup>) be a set endowed with a quasi-order. A sequence (xn)<sup>n</sup> ∈ E<sup>N</sup> is *good* whenever there exists i<j such that x<sup>i</sup> ≤ x<sup>j</sup> . A quasi-ordered set (E, ≤) is a *well-quasi-ordered* — abbreviated as wqo — if every sequence is good. By calling a sequence *bad* whenever it is not good, well-quasi-orderings are equivalently defined as having no infinite bad sequences. This generalisation of well-founded total orderings can be used as a basis for proving program termination. For instance, algorithms alike Example 1.1 can be studied via well-quasi-orderings and the length of their bad sequences [5]. More generally, one can map the states of a run to a wqo via a so-called quasi-ranking function to both prove the termination of the program and gain information about its runtime [27, Chapter 2]. Let us provide a concrete example of this proof scheme.

*Example 1.1.* Let Alg be the algorithm with three integer variables a, b, c that non-deterministically performs one of the following operations until a, b or c becomes negative: (l) a, b, c←a − 1, b, 2c or (r) a, b, c←2c, b − 1, 1.

**Lemma 1.2.** *For every choice of* a, b, c <sup>∈</sup> <sup>N</sup><sup>3</sup>*, the algorithm* Alg *terminates.*

*Proof.* Let us prove that Alg builds a bad sequence of triples when ordering N<sup>3</sup> with (a1, b1, c1) ≤ (a2, b2, c2) whenever a<sup>1</sup> ≤ a2, b<sup>1</sup> ≤ b2, and c<sup>1</sup> ≤ c2. If (ai, bi, ci) and (a<sup>j</sup> , b<sup>j</sup> , c<sup>j</sup> ) represent two configurations in a run of Alg, either only rule (l) was fired and a<sup>j</sup> < ai, or rule (r) was fired as least once, and b<sup>j</sup> < bi.

Because (N<sup>3</sup>, <sup>≤</sup>) is a well-quasi-ordering (see Dickson's Lemma in [28]), Alg terminates for every choice of initial triple (a, b, c) <sup>∈</sup> <sup>N</sup><sup>3</sup>.

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. https://doi.org/10.1007/978-3-031-30829-1 22 456–476, 2023.

As a combinatorial tool, well-quasi-orderings appear frequently in varying fields of computer science, ranging from graph theory to number theory [18, 22, 21, 3]. Well-quasi-orderings have also been highly successful in proving the termination of verification algorithms. One critical application of well-quasi-orderings is to the verification of infinite state transition systems, via the study of so-called Well-Structured Transition Systems (WSTS) [1, 2, 16, 7].

**Noetherian spaces.** A major roadblock arises when using well-quasi-orders: the powerset of a well-quasi-order may fail to be one itself [26]. This is particularly problematic in the study of WSTS, where the powerset construction appears frequently [19, 29, 1]. To tackle this issue, one can justify that the quasi-orders of interest are not pathological, and are actually better quasi-orders [25, 23]. Another approach is offered by the topological notion of Noetherian space, which as pointed out by Goubault-Larrecq, can act as a suitable generalisation of wellquasi-orderings that is preserved under the powerset construction [10].

The topological analogues to WSTS enjoy similar decidability properties, and there even exists an analogue to Karp and Miller's forward analysis for Petri nets [11]. Moreover, their topological nature allows to verify systems beyond the reach of quasi-orderings, such as lossy concurrent polynomial programs [11]. This is possible because the polynomials are handled via results from algebraic geometry, through the notion of the Zariski topology over C<sup>n</sup> [12, Exercise 9.7.53].

One drawback of the topological approach is that many topologies correspond to a single quasi-ordering. Hence, when the problem is better described via an ordering, one has to choose a specific topology, and there usually does not exist a finest one that is Noetherian.

**Inductively defined datatypes.** As for well-quasi-orders, Noetherian spaces are stable under finite products and finite sums [28, 12]. While this can be enough to describe the set of configurations of a Petri net using N<sup>k</sup>, it does not allow to talk about more complex data structures, that are typically defined inductively, such as lists and trees. To make the above statement precise, let **1** be the singleton set, A <sup>+</sup> B be the disjoint union of A and B, and A <sup>×</sup> B their cartesian product. Then, the set of finite words over an alphabet Σ is precisely the least fixed point of F : X -<sup>→</sup> **<sup>1</sup>** <sup>+</sup> Σ <sup>×</sup> X. Similarly, the set of finite trees over Σ equals lfp<sup>X</sup>.Σ <sup>×</sup> <sup>X</sup><sup>∗</sup>, where lfp<sup>X</sup>.F(X) denotes the least fixed point of <sup>F</sup>.

In the realm of well-quasi-orderings, the specific cases of finite words and finite trees are handled respectively via Higman's Lemma [18] and Kruskal's Tree Theorem [22]. Let us recall that a word <sup>u</sup> embeds into a word <sup>w</sup> (written <sup>u</sup> <sup>≤</sup><sup>∗</sup> <sup>v</sup>) whenever whenever there exists a strictly increasing map h: <sup>|</sup>w|→|w | such that <sup>w</sup><sup>i</sup> <sup>≤</sup> <sup>w</sup><sup>h</sup>(i) for 1 <sup>≤</sup> <sup>i</sup> ≤ |w|. Similarly, a tree <sup>t</sup> embeds into a tree <sup>t</sup> (written <sup>t</sup> <sup>≤</sup>tree <sup>t</sup> ) whenever there exists a map from nodes of t to nodes of t respecting the least common ancestor relation, and increasing the colours of the nodes. Proofs that finite words and finite trees preserve well-quasi-orderings typically rely on a so-called minimal bad sequence argument due to Nash-Williams [24]. However, the argument is quite subtle, and needs to be handled with care [9, 30]. In addition, the argument is not compositional and has to be slightly modified whenever a new inductive construction is desired [as in, e.g., 4, 3].

This picture has been adapted by Goubault-Larrecq to the topological setting by proposing analogues of the word embedding and tree embedding, together with a proof that they preserve Noetherian spaces [12, Section 9.7]. However, both the definitions and the proofs have an increased complexity, as they rely on an adapted "topological minimal bad sequence argument" that appears to be even more subtle [14, errata n. 26]. Moreover, the newly introduced topologies have involved definitions often relying on ad-hoc constructions.

In the case of well-quasi-orderings, two generic fixed point constructions have been proposed to handle inductively defined datatypes [17, 8]. In these frameworks, lfpX.F(X) is guaranteed to be a well-quasi-ordering provided that F is a "well-behaved functor" of quasi-orders. Both proposals, while relying on different categorical notions, successfully recover Higman's word embedding and Kruskal's tree embedding through their respective definitions as least fixed points. As a side effect, they reinforce the idea that these two quasi-orders are somehow canonical.

In the case of Noetherian spaces, no equivalent framework exists to build inductive datatypes, and the notions of "well-behaved" constructors from [17, 8] rule out the use of important Noetherian spaces, as they require that an element <sup>a</sup> <sup>∈</sup> <sup>F</sup>(X) has been built using *finitely many* elements of <sup>X</sup>: while this is the case for finite words and finite trees, it does not hold for the arbitrary powerset. Moreover, there have been recent advances in placing Noetherian topologies over spaces that are not straightforwardly obtained through "well-behaved" definitions, such as infinite words [13], or even ordinal length words [15].

#### **1.1 Contributions of this paper**

In this paper, we propose a least fixed point theorem for Noetherian topologies. This is done in a way that greatly differs from the categorical frameworks introduced in the study of well-quasi-orders, as the construction of the space is entirely *decoupled* from the construction of the topology. In particular, the carrier set X itself need not be inductively defined.

In this setting, we consider a fixed set X and a map R from topologies τ over X to topologies R(τ ) over X. Because the set of topologies over X is a complete lattice, it suffices to ask for R to be monotone to guarantee that it has a least fixed point, that we write lfp<sup>τ</sup> .R(τ ). In general, this least fixed point will not be Noetherian, but we show that a simple sufficient condition on R guarantees that it is. This main theorem (Theorem 3.21), encapsulates all the complexity of the topological adaptations of the minimal bad sequences arguments [12, Section 9.7], and we believe that it has its own interest.

The necessity to separate the construction of the set of points from the construction of the topology might be perceived as a weakness of the theory, when it is in fact a strength of our approach. We illustrate this by giving a shorter proof that the words of ordinal length are Noetherian [15], without providing an inductive definition of the space. As an illustration of the versatility of our framework, we introduce a reasonable topology over ordinal branching trees (with finite depth), and prove that it is Noetherian using the same technique.

In the specific cases where the space of interest can be obtained as a least fixed point of a "well-behaved" functor, we show how Theorem 3.21 can be used to generalise the categorical framework of Hasegawa [17] to a topological setting. As well as adding inductively defined topologies (hence, inductively defined datatypes) to the theory of Noetherian spaces, this provide a reasonable answer to the canonicity issue previously mentioned.

**Outline.** In Section 2 we recall some of the main results in the theory of Noetherian spaces. In Section 3 we prove our main result (Theorem 3.21). In Section 4 we explore how this result covers existing topological results in the literature, and provide a new non-trivial Noetherian space (Definition 4.7). In Section 5, we leverage our main result to devise a Noetherian topology over inductively defined datatypes (Theorem 5.13), and prove that this generalises the work of Hasegawa over well-quasi-orders (Theorem 5.15).

## **2 A Quick Primer on Noetherian Topologies**

<sup>A</sup> *topological space* is a pair (<sup>X</sup> , τ ) where τ <sup>⊆</sup> <sup>P</sup>(<sup>X</sup> ), τ is stable under finite intersections, and τ is stable under arbitrary unions. A subset U ⊆ X is an *open subset* when U <sup>∈</sup> τ , and a *closed subset* when X \ U <sup>∈</sup> τ . As an ordertheoretic counterpart to open and closed subsets, we say that a subest U of a quasi-ordered set (E, <sup>≤</sup>) is *upwards-closed* whenever for all x <sup>∈</sup> U, x <sup>≤</sup> y implies y <sup>∈</sup> U. Similarly, a subset is *downwards-closed* whenever its complement is upwards-closed. One can convert back and forth between the two as follows:

*Notation 2.1.* Let (E, <sup>≤</sup>) be a quasi-order and (<sup>X</sup> , τ ) be a topological space. The *Alexandroff topology* alex(≤) over <sup>E</sup> is the collection of upwards-closed subsets of <sup>E</sup>. The *specialisation preorder* <sup>≤</sup><sup>τ</sup> is defined via <sup>x</sup> <sup>≤</sup> <sup>τ</sup> <sup>y</sup> whenever for every open subset U <sup>∈</sup> τ , if x <sup>∈</sup> U then y <sup>∈</sup> U.

It is an easy check that the specialisation pre-order of the Alexandroff topology of a quasi-order ≤ is the quasi-order itself. Beware that several topologies can share the same specialisation pre-order ≤, and among those, the Alexandroff topology is the finest.

We can now build the topological analogue to wqos through the notion of compactness: a subset K of <sup>X</sup> is defined as *compact* whenever from every family (U<sup>i</sup>)<sup>i</sup>∈<sup>I</sup> of open sets such that <sup>K</sup> <sup>⊆</sup> - <sup>i</sup>∈<sup>I</sup> <sup>U</sup><sup>i</sup>, one can extract a finite subset J <sup>⊆</sup> I such that K <sup>⊆</sup> - <sup>i</sup>∈<sup>J</sup> <sup>U</sup><sup>i</sup>. A quasi-order (E, <sup>≤</sup>) is wqo if and only if every subset K of <sup>E</sup> is compact for alex(≤). Generalising this property to arbitrary topological spaces (<sup>X</sup> , τ ), a topological space (<sup>X</sup> , τ ) is said to be a *Noetherian space* whenever every subset of <sup>X</sup> is compact.


**Table 1.** An *algebra* of Noetherian spaces [see 10, 12, 15].

Remark 2.2. A space (X , τ ) is Noetherian if and only if for every increasing sequence of open subsets (Ui)<sup>i</sup>∈<sup>N</sup>, there exists <sup>j</sup> <sup>∈</sup> <sup>N</sup> such that - <sup>i</sup>∈<sup>N</sup> <sup>U</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>≤<sup>j</sup> <sup>U</sup>i.

In order to inductively define Noetherian spaces, we will often rely on basic constructors such as the disjoint sum and the finite product. For completeness, we recall in Table 1 usual constructors that preserve Noetherian spaces. This table also illustrate the versatility of the concept, that encompasses both the algebraic properties of C<sup>k</sup> and the order properties of well-quasi-orders.

## **3 Refinements of Noetherian topologies**

Let us fix a set X . The collection of topologies over X is itself a set, and forms a complete lattice for inclusion. In this lattice, the least element is the trivial topology <sup>τ</sup>triv := {∅, X }, and the greatest element is the discrete topology <sup>P</sup>(<sup>X</sup> ). Thanks to Tarski's fixed point theorem, every monotone function R mapping topologies over X to topologies over X has a least fixed point, which can be obtained by transfinitely iterating R from the trivial topology. Writing lfp<sup>τ</sup> .R(τ ) for the least fixed point of R, our goal is to provide sufficient conditions for (X , lfp<sup>τ</sup> .R(τ )) to be Noetherian.

**Definition 3.1.** A refinement function over a set X is a function R mapping topologies over X to topologies over X . Moreover, we assume that R(τ ) is Noetherian whenever τ is, and that R(τ ) ⊆ R(τ ) when τ ⊆ τ .

As (<sup>X</sup> , <sup>τ</sup>triv) is always Noetherian, (<sup>X</sup> , <sup>R</sup><sup>n</sup>(τtriv)) is Noetherian for all <sup>n</sup> <sup>∈</sup> <sup>N</sup> and refinement function R. However, it remains unclear whether the transfinite iterations needed to reach a fixed point preserve Noetherian spaces.

We demonstrate in Example 3.2 how to obtain the topology alex(≤) over N as a least fixed point of some simple refinement function. Before that, let us define the notion of upwards-closure: given a quasi-order (E, ≤) and a set E ⊆ E, let us define the upwards-closure of E, written ↑<sup>≤</sup> E, as the set of elements that are greater or equal than some element of E in E.

*Example 3.2 (Natural Numbers).* Over X := <sup>N</sup>, one can define Div(τ ) as the collection of the sets <sup>↑</sup><sup>≤</sup> (<sup>U</sup> + 1) for <sup>U</sup> <sup>∈</sup> <sup>τ</sup> , plus <sup>N</sup> itself. Then Div(τtriv) = {∅, <sup>↑</sup><sup>≤</sup> <sup>1</sup>, <sup>N</sup>}, Div<sup>2</sup>(τtriv) = {∅, <sup>↑</sup><sup>≤</sup> <sup>1</sup>, <sup>↑</sup><sup>≤</sup> <sup>2</sup>, <sup>N</sup>}. More generally, for every <sup>k</sup> <sup>≥</sup> 0, Divk(τtriv) = {∅, <sup>↑</sup><sup>≤</sup> <sup>1</sup>,..., <sup>↑</sup><sup>≤</sup> k, <sup>N</sup>}. It is an easy check that lfpτ .Div(<sup>τ</sup> ) is precisely alex(≤), which is Noetherian because (N, <sup>≤</sup>) is a well-quasi-ordering.

## **3.1 An ill-behaved refinement function**

Not all refinement functions behave as nicely as in Example 3.2, and one can obtain non-Noetherian topologies via their least fixed points.

Let us consider for this section Σ := {a, b} with the discrete topology, i.e., {∅, {a}, {b}, Σ}. Let us now build the set Σ<sup>∗</sup> of finite words over <sup>Σ</sup>. Whenever U and V are subsets of Σ<sup>∗</sup>, let us write UV for their concatenation, defined as {uv : u <sup>∈</sup> U, v <sup>∈</sup> V }. To construct an ill-behaved refinement function, we will associate to a topology τ the set {UV : U ∈ {∅, {a}, {b}, Σ} , V <sup>∈</sup> τ}. However, the latter fails to be a topology in general. This problem frequently appears in this paper, and is solved by considering the so-called generated topology.

Let us briefly recall that for every set <sup>X</sup> and collection of subsets B <sup>⊆</sup> <sup>P</sup>(<sup>X</sup> ), one can construct the topology generated from B as the least topology on <sup>X</sup> containing B. This topology coincides with the one containing arbitrary unions of finite intersections of subsets in B. We say that B is a *subbasis* of τ when τ is the topology generated by B. Alexanders's Subbase Lemma allows to study Noetherian spaces in this setting [12, Thm. 4.4.29]: it states that checking whether a subset K of <sup>X</sup> is compact in τ can be done by considering only open subsets in <sup>B</sup>, i.e., that for every family (Ui)i∈I of a subbasis <sup>B</sup> of <sup>τ</sup> such that K <sup>⊆</sup> - i∈I <sup>U</sup>i, one can extract a finite subset <sup>J</sup> <sup>⊆</sup> <sup>I</sup> such that <sup>K</sup> <sup>⊆</sup> - j∈J <sup>U</sup><sup>j</sup> .

**Definition 3.3.** *Let* <sup>R</sup>pref *be the function mapping a topology* <sup>τ</sup> *over* <sup>Σ</sup><sup>∗</sup> *to the topology generated by the sets* UV *where* U <sup>⊆</sup> Σ *and* V <sup>∈</sup> τ *,*

We refer to Figure 1 for a graphical presentation of the first two iterations of the refinement function Rpref. For the sake of completeness, let us compute lfpτ .Rpref(<sup>τ</sup> ), which is the Alexandroff topology of the prefix ordering on words.

**Definition 3.4.** *The* prefix topology<sup>3</sup> <sup>τ</sup>pref<sup>∗</sup> *, over* <sup>Σ</sup><sup>∗</sup> *is generated by the following open sets:* <sup>U</sup><sup>1</sup> ...UnΣ<sup>∗</sup>*, where* <sup>n</sup> <sup>≥</sup> <sup>0</sup> *and* <sup>U</sup>i <sup>⊆</sup> <sup>Σ</sup>*.*

**Lemma 3.5.** *The prefix topology over* <sup>Σ</sup><sup>∗</sup> *is the least fixed point of* <sup>R</sup>pref*.*

**Lemma 3.6.** *The function* <sup>R</sup>pref *is a refinement function.*

*Proof.* It is an easy check that whenever τ <sup>⊆</sup> τ , <sup>R</sup>pref(τ ) <sup>⊆</sup> <sup>R</sup>pref(τ ). Now, assume that τ is Noetherian, it remains to prove that <sup>R</sup>pref(<sup>τ</sup> ) remains Noetherian. Consider a subset E <sup>⊆</sup> Σ<sup>∗</sup> and let us prove that <sup>E</sup> is compact in <sup>R</sup>pref(<sup>τ</sup> ).

<sup>3</sup> This definition differs from what is called the "prefix topology" in the literature [see 6, 12, resp. Section 8 and Exercise 9.7.36].

∅

∅

**Fig. 1.** Iterating <sup>R</sup>pref over <sup>Σ</sup>∗. On the left the trivial topology <sup>τ</sup>triv, followed by <sup>R</sup>pref, and on the right Rpref <sup>2</sup>.

∅

For that, we consider an open cover E ⊆ - <sup>i</sup>∈<sup>I</sup> <sup>W</sup>i, where <sup>W</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>pref(<sup>τ</sup> ). Thanks to Alexander's subbase lemma, we can assume without loss of generality that W<sup>i</sup> is a subbasic open set of Rpref(τ ), that is, W<sup>i</sup> = UiV<sup>i</sup> with U<sup>i</sup> ⊆ Σ and V<sup>i</sup> ∈ τ .

Since (Σ∗, τ ) × (Σ∗, τ ) is Noetherian (see Table 1), there exists a finite set J ⊆ I such that - <sup>i</sup>∈<sup>J</sup> <sup>U</sup><sup>i</sup> <sup>×</sup> <sup>V</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>I</sup> <sup>U</sup><sup>i</sup> <sup>×</sup> <sup>V</sup>i. This implies that <sup>E</sup> <sup>⊆</sup> - <sup>i</sup>∈<sup>J</sup> <sup>U</sup>iVi, and provides a finite subcover of E.

The sequence - <sup>0</sup>≤i≤<sup>k</sup> <sup>a</sup><sup>i</sup> bΣ∗, for <sup>k</sup> <sup>∈</sup> <sup>N</sup>, is a strictly increasing sequence of opens. Therefore, the prefix topology is not Noetherian. The terms a<sup>i</sup> bΣ<sup>∗</sup> can be observed in Figure 1 as a diagonal of incomparable open sets.

## **Corollary 3.7.** *The topology* lfp<sup>τ</sup> .Rpref(τ ) *is not Noetherian.*

The prefix topology is not Noetherian, even when starting from a finite alphabet. However, we claimed in Section 1 that there is a natural generalisation of the subword embedding to topological spaces which is Noetherian. Before introducing this topology, let us write [U1,...,Un] as a shorthand notation for the set Σ∗U1Σ<sup>∗</sup> ...Σ∗UnΣ∗.

**Definition 3.8 (Subword topology [12, Definition 9.7.26]).** *Given a topological space* (Σ,τ )*, the space* Σ<sup>∗</sup> *of finite words over* Σ *can be endowed with the* subword topology*, generated by the open sets* [U1,...,Un] *when* <sup>U</sup><sup>i</sup> <sup>∈</sup> <sup>τ</sup> *.*

The *topological Higman lemma* [12, Theorem 9.7.33] states that the subword topology over Σ<sup>∗</sup> is Noetherian if and only if Σ is Noetherian. Although the subword topology might seem ad-hoc, it can be validated as a generalisation of the subword embedding because the subword topology of alex(≤) equals the Alexandroff topology of the subword ordering of ≤, for every quasi-order ≤ over Σ [12, Exercise 9.7.30]. Let us now reverse engineer a refinement function whose least fixed point is the subword topology.

**Definition 3.9.** *Let* (Σ,θ) *be a topological space. Let* E<sup>θ</sup> words *be defined as mapping a topology* <sup>τ</sup> *over* <sup>Σ</sup><sup>∗</sup> *to the topology generated by the following sets:* <sup>↑</sup><sup>≤</sup><sup>∗</sup> UV *for* U, V <sup>∈</sup> <sup>τ</sup> *; and* <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>W</sup>*, for* <sup>W</sup> <sup>∈</sup> <sup>θ</sup>*.*

**Fig. 2.** The topology E<sup>θ</sup> words<sup>2</sup>(τtriv), with bold red arrows for the inclusions that were not present between the "analogous sets" in Rpref <sup>2</sup>(τtriv). We have taken θ to be the discrete topology over Σ.

**Lemma 3.10.** *Let* (Σ,θ) *be a topological space. The subword topology over* <sup>Σ</sup><sup>∗</sup> *is the least fixed point of* E<sup>θ</sup> words*.*

In order to show that E<sup>θ</sup> words is a refinement function, we first claim that the two parts of the topology can be dealt with separately.

**Lemma 3.11 ([12, Proposition 9.7.18]).** *If* (<sup>X</sup> , τ ) *and* (<sup>X</sup> , τ ) *are Noetherian, then* <sup>X</sup> *endowed with the topology generated by* <sup>τ</sup> <sup>∪</sup> <sup>τ</sup> *is Noetherian.*

**Lemma 3.12.** *Let* (Σ,θ) *be a Noetherian topological space. The map* <sup>E</sup><sup>θ</sup> words *is a refinement function over* Σ*.*

*Proof.* We leave the monotonicity of E<sup>θ</sup> words as an exercice and focus on the proof that E<sup>θ</sup> words(<sup>τ</sup> ) is Noetherian, whenever <sup>τ</sup> is. Thanks to Lemma 3.11, it suffices to prove that the topology generated by the sets <sup>↑</sup><sup>≤</sup><sup>∗</sup> UV (U, V open in <sup>τ</sup> ), and the topology generated by the sets <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>W</sup> (<sup>W</sup> open in <sup>θ</sup>) are Noetherian.

Let (↑<sup>≤</sup><sup>∗</sup> <sup>U</sup>iVi)<sup>i</sup>∈<sup>N</sup> be a sequence of open sets. Because Noetherian topologies are closed under products (see Table 1), there exists k such that - <sup>i</sup>≤<sup>k</sup> <sup>U</sup><sup>i</sup> <sup>×</sup> <sup>V</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>N</sup> <sup>U</sup><sup>i</sup> <sup>×</sup> <sup>V</sup>i. Hence, - <sup>i</sup>≤<sup>k</sup> <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>U</sup>iV<sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>N</sup> <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>U</sup>iV<sup>i</sup>

Let <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>W</sup><sup>i</sup> be a sequence of open sets. Because <sup>θ</sup> is Noetherian, there exists k such that - <sup>i</sup>≤<sup>k</sup> <sup>W</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>N</sup> <sup>W</sup>i, hence - <sup>i</sup>≤<sup>k</sup> <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>W</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>N</sup> <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>W</sup>i.

We have designed two refinement functions Rpref and E<sup>θ</sup> words over <sup>Σ</sup>∗. Fixing <sup>θ</sup> to be the discrete topology over Σ, the least fixed point of Rpref is not Noetherian while the least fixed point of E<sup>θ</sup> words is. We have depicted the result of iterating Eθ words twice over the trivial topology in Figure 2. As opposed to Rpref, the "diagonal" elements are comparable for inclusion.

## **3.2 Well-behaved refinement functions**

In this section, we will show how the behaviour of refinement function with respect to subsets will act as a sufficient condition to separate the well-behaved ones from the others. In order to make the idea of computing the refinement function directly over a subset precise, we will replace a subset with the induced topology by a "restricted" topology over the whole space.

**Definition 3.13.** *Let* (<sup>X</sup> , τ ) *be a topological space and* <sup>H</sup> *be a closed subset of* <sup>X</sup> *. Define the* subset restriction <sup>τ</sup> <sup>|</sup><sup>H</sup> *to be the topology generated by the open subsets* <sup>U</sup> <sup>∩</sup> <sup>H</sup> *where* <sup>U</sup> *ranges over* <sup>τ</sup> *.*

Let X be a topological space, and H be a proper closed subset of X . The space X endowed with τ |H has a lattice of open sets that is isomorphic to the one of the space H endowed with the topology induced by τ , except for the entire space X itself. As witnessed by Example 3.14, the two spaces are in general not homeomorphic.

*Example 3.14.* Let <sup>R</sup> be endowed with the usual metric topology. The set {a} is a closed set when <sup>a</sup> <sup>∈</sup> <sup>R</sup>. The induced topology over {a} is {∅, {a}}. The subset restriction of the topology to {a} is <sup>τ</sup><sup>a</sup> := {∅, {a}, <sup>R</sup>}. Clearly, (R, τa) and ({a}, τtriv) are not homeomorphic.

In order to build intuition, let us consider the special case of an Alexandroff topology over X and compute the specialisation preorder of τ |H, where H is a downwards closed set.

**Lemma 3.15.** *Let* <sup>τ</sup> <sup>=</sup> alex(≤) *over a set* <sup>X</sup> *, and* x, y <sup>∈</sup> <sup>X</sup>*. Then,* <sup>x</sup> <sup>≤</sup> <sup>τ</sup>|<sup>H</sup> <sup>y</sup> *if and only if* <sup>x</sup> <sup>≤</sup> <sup>τ</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>∈</sup> <sup>H</sup> *or* <sup>x</sup> ∈ <sup>H</sup>*. In other words,* <sup>H</sup><sup>c</sup> *is collapsed to an equivalence class below* <sup>H</sup> *itself.*

**Definition 3.16.** *A* topology expander *is a refinement function* <sup>E</sup> *that satisfies the following extra property: for every Noetherian topology* <sup>τ</sup> *satisfying* <sup>τ</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> )*, for all closed set* <sup>H</sup> *in* <sup>τ</sup> *,* <sup>E</sup>(<sup>τ</sup> )|<sup>H</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> <sup>|</sup>H)|H*.*

**Lemma 3.17.** *The refinement function* <sup>R</sup>pref *is not a topology expander.*

*Proof.* Let us consider <sup>τ</sup> := {∅, aΣ∗, bΣ∗, Σ<sup>∗</sup>}. Remark that <sup>H</sup> := aΣ<sup>∗</sup> ∪ {ε} is a closed subset because Σ = {a, b}. It is an easy check that Rpref(τ )|H = {∅, aaΣ∗, abΣ∗, aΣ∗, Σ<sup>∗</sup>} = {∅, aaΣ∗, aΣ∗, Σ<sup>∗</sup>} = Rpref(τ |H)|H.

**Lemma 3.18.** *When* <sup>θ</sup> *is Noetherian,* <sup>E</sup><sup>θ</sup> words *is a topology expander.*

*Proof.* We have proven in Lemma 3.12 that <sup>E</sup><sup>θ</sup> words is a refinement function. Let us now prove that it is a topology expander.

Let <sup>τ</sup> be a Noetherian topology over <sup>Σ</sup>∗, such that <sup>τ</sup> <sup>⊆</sup> <sup>E</sup><sup>θ</sup> words(τ ). Let H be a closed subset of (Σ∗, τ ). Notice that as H is closed in τ , and since τ ⊆ Eθ words(τ ), H is downwards closed for ≤∗. As a consequence, (↑<sup>≤</sup><sup>∗</sup> UV ) ∩ H = (↑<sup>≤</sup><sup>∗</sup> (<sup>U</sup> <sup>∩</sup> <sup>H</sup>)(<sup>V</sup> <sup>∩</sup> <sup>H</sup>)) <sup>∩</sup> <sup>H</sup>. Hence, <sup>E</sup><sup>θ</sup> words(<sup>τ</sup> )|<sup>H</sup> <sup>⊆</sup> <sup>E</sup><sup>θ</sup> words(τ |H)|H. 

#### **3.3 Iterating Expanders**

Our goal is now to prove that topology expanders are refinement functions that can be safely iterated. For that, let us first define precisely what "iterating transfinitely" a refinement function means.

**Definition 3.19.** *Let* (X , τ ) *be a topological space, and* E *be a topology expander. The* limit topology E<sup>α</sup>(τ ) *is defined as:* τ *when* α = 0*,* E(E<sup>β</sup>(τ )) *when* α = β + 1*, and as the join of the topologies* E<sup>β</sup>(τ ) *for all* β<α*, when* α *is a limit ordinal.*

We devote the rest of this section to proving our main theorem, which immediately implies that least fixed points of topology expanders are Noetherian. Notice that the theorem is trivial whenever α is a successor ordinal.

**Proposition 3.20.** *Let* α *be an ordinal,* τ *be a topology, and* E *be a topology expander. If* <sup>E</sup><sup>β</sup>(<sup>τ</sup> ) *is Noetherian for all* β<α*, and* <sup>τ</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> )*, then* <sup>E</sup><sup>α</sup>(<sup>τ</sup> ) *is Noetherian.*

**Theorem 3.21 (Main Result).** *Let* X *be a set and* E *be a topology expander. The least fixed point of* E *is a Noetherian topology over* X *.*

**The topological minimal bad sequence argument.** In order to prove Theorem 3.21, we will use a topological minimal bad sequence argument. To that end, let us first introduce a well-founded partial ordering over the elements of <sup>E</sup><sup>α</sup>(<sup>τ</sup> ). With an open set <sup>U</sup> <sup>∈</sup> <sup>E</sup><sup>α</sup>(<sup>τ</sup> ), we associate a depth depth(U), defined as the smallest ordinal <sup>β</sup> <sup>≤</sup> <sup>α</sup> such that <sup>U</sup> <sup>∈</sup> <sup>E</sup><sup>β</sup>(<sup>τ</sup> ). We then define <sup>U</sup> - V to hold whenever depth(U) ≤ depth(V ), and U V whenever depth(U) < depth(V ). It is an easy check that this is a well-founded total quasi-order over E<sup>α</sup>(τ ).

As a first step towards proving that E<sup>α</sup>(τ ) is Noetherian for a limit ordinal α, we first reduce the problem to open subsets of depth strictly less than α itself.

**Lemma 3.22.** *Let* α *be a limit ordinal, and* E *be a topology expander. The topology* E<sup>α</sup>(τ ) *has a subbasis of elements of depth strictly below* α*.*

Let us recall the notion of topological bad sequence designed by Goubault-Larrecq [12, Lemma 9.7.31] in the proof of the Topological Kruskal Theorem, adapted to our ordering of subbasic open sets.

**Definition 3.23.** *Let* (<sup>X</sup> , τ ) *be a topological space. A sequence* <sup>U</sup> = (Ui)<sup>i</sup>∈<sup>N</sup> *of open subsets is* good *if there exists* <sup>i</sup> <sup>∈</sup> <sup>N</sup> *such that* <sup>U</sup><sup>i</sup> <sup>⊆</sup> - j<i U<sup>j</sup> *. A sequence that is not good is called* bad*.*

**Lemma 3.24.** *Let* α *be a limit ordinal, and* E *be a topology expander such that* <sup>E</sup><sup>α</sup>(<sup>τ</sup> ) *is not Noetherian. Then, there exists a bad sequence* <sup>U</sup> *of open subsets in* E<sup>α</sup>(τ ) *of depth less than* α *that is lexicographically minimal for* -*. Such a sequence is called* minimal bad*.*

466 A. Lopez

We deduce that in a limit topology, minimal bad sequences are not allowed to use open subsets of arbitrary depth. This will then be leveraged via Lemma 3.27 to decrease the depth by one.

**Lemma 3.25.** *Let* α *be a limit ordinal,* τ *be a topology and* E *be a topology expander such that* <sup>E</sup><sup>β</sup>(<sup>τ</sup> ) *is Noetherian for all* β<α*. Assume that* <sup>U</sup> = (Ui)<sup>i</sup>∈<sup>N</sup> *is a minimal bad sequence of* <sup>E</sup><sup>α</sup>(<sup>τ</sup> )*. Then, for every* <sup>i</sup> <sup>∈</sup> <sup>N</sup>*,* depth(Ui) *is either* 0 *or a successor ordinal.*

**Definition 3.26.** *Let* α *be an ordinal,* τ *be a topology,* E *be a topology expander such that* <sup>τ</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> )*, and let* <sup>U</sup> <sup>∈</sup> <sup>E</sup><sup>α</sup>(<sup>τ</sup> )*. The topology* Down(U) *is generated by the open sets* V *such that* V U*, where* V *ranges over* E<sup>α</sup>(τ )*.*

**Lemma 3.27.** *Let* <sup>α</sup> *be an ordinal,* <sup>E</sup> *be a topology expander and* <sup>U</sup> <sup>∈</sup> <sup>E</sup><sup>α</sup>(<sup>τ</sup> )*. If* depth(U) *is a successor ordinal, then* U ∈ E(Down(U))*.*

If <sup>U</sup> is a minimal bad sequence in (X, <sup>E</sup><sup>α</sup>(<sup>τ</sup> )), then <sup>U</sup><sup>i</sup> ⊆ - j<i U<sup>j</sup> := Vi, i.e., <sup>U</sup>i∩<sup>V</sup> <sup>c</sup> <sup>i</sup> = ∅. We can now use our subset restriction operator to devise a topology associated to this minimal bad sequence. Noticing that H<sup>i</sup> := V <sup>c</sup> <sup>i</sup> is a closed set in <sup>E</sup><sup>α</sup>(<sup>τ</sup> ), hence we can build the subset restriction Down(Ui)|Hi.

**Definition 3.28.** *Let* α *be an ordinal,* τ *be a topology,* E *be a topology expander such that* <sup>τ</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> )*, and let* <sup>U</sup> = (Ui)<sup>i</sup>∈<sup>N</sup> *be a minimal bad sequence in* <sup>E</sup><sup>α</sup>(<sup>τ</sup> )*. Then, the* minimal topology <sup>U</sup>(E<sup>α</sup>(<sup>τ</sup> )) *is generated by* - <sup>i</sup>∈<sup>N</sup> Down(Ui)|Hi*, where* H<sup>i</sup> := (- j<i U<sup>j</sup> )<sup>c</sup>*.*

**Lemma 3.29.** *Let* α *be an ordinal,* τ *be a topology,* E *be a topology expander such that* <sup>τ</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> )*, and let* <sup>U</sup> = (Ui)<sup>i</sup>∈<sup>N</sup> *be a minimal bad sequence in* <sup>E</sup><sup>α</sup>(<sup>τ</sup> )*. Then, the minimal topology* <sup>U</sup>(E<sup>α</sup>(<sup>τ</sup> )) *is Noetherian.*

*Proof.* Assume by contradiction that <sup>U</sup>(E<sup>α</sup>(<sup>τ</sup> )) is not Noetherian. Let us define V<sup>i</sup> as - j<i U<sup>j</sup> , and H<sup>i</sup> as V <sup>c</sup> i .

Thanks to [12, Lemma 9.7.15] there exists a bad sequence W := (Wi)<sup>i</sup>∈<sup>N</sup> of subbasic elements of <sup>U</sup>(E<sup>α</sup>(<sup>τ</sup> )). By definition, <sup>W</sup><sup>i</sup> is in some Down(U<sup>j</sup> )|H<sup>j</sup> . Let us select a mapping <sup>ρ</sup>: <sup>N</sup> <sup>→</sup> <sup>N</sup>, such that <sup>W</sup><sup>i</sup> <sup>∈</sup> Down(Uρ(i))|Hρ(i). This amounts to the existence of an open Tρ(i), such that Tρ(i) Uρ(i), and W<sup>i</sup> = Tρ(i) \ Vρ(i). Without loss of generality we assume that ρ is monotonic.

Let us build the sequence Y defined by Y<sup>i</sup> := U<sup>i</sup> if i<ρ(0) and Y<sup>i</sup> := Tρ(i) otherwise. This is a sequence of open sets in E<sup>α</sup>(τ ) that is lexicographically smaller than <sup>U</sup>, hence <sup>Y</sup> is a good sequence: there exists <sup>i</sup> <sup>∈</sup> <sup>N</sup> such that <sup>Y</sup><sup>i</sup> <sup>⊆</sup> - j<i Y<sup>j</sup> .


We are now ready to leverage our knowledge of minimal topologies associated with minimal bad sequences to carry on the proof of our main theorem.

**Proposition 3.20.** *Let* α *be an ordinal,* τ *be a topology, and* E *be a topology expander. If* <sup>E</sup><sup>β</sup>(<sup>τ</sup> ) *is Noetherian for all* β<α*, and* <sup>τ</sup> <sup>⊆</sup> <sup>E</sup>(<sup>τ</sup> )*, then* <sup>E</sup><sup>α</sup>(<sup>τ</sup> ) *is Noetherian.*

*Proof.* If α is a successor ordinal, then α = β + 1 and E<sup>α</sup>(τ ) = E(E<sup>β</sup>(τ )). Because E respects Noetherian topologies, we immediately conclude that E<sup>α</sup>(τ ) is Noetherian. We are therefore only interested in the case where α is a limit ordinal.

Assume by contradiction that E<sup>α</sup>(τ ) is not Noetherian, using Lemma 3.24 there exists a minimal bad sequence U := (Ui)<sup>i</sup>∈<sup>N</sup>. Let us write d<sup>i</sup> := depth(Ui) < α. Thanks to Lemma 3.25, d<sup>i</sup> is either 0 or a successor ordinal.

Because E<sup>β</sup>(τ ) is Noetherian for β<α, there are finitely many open subsets U<sup>i</sup> at depth β for every ordinal β<α. Indeed, if they were infinitely many, one would extract an infinite bad sequence of opens in E<sup>β</sup>(τ ), which is absurd.

Furthermore, the sequence (di)<sup>i</sup>∈<sup>N</sup> must be monotonic, otherwise U would not be lexicographically minimal. We can therefore construct a strictly increasing map <sup>ρ</sup>: <sup>N</sup> <sup>→</sup> <sup>N</sup> such that 0 <sup>&</sup>lt; depth(Uρ(j)) and depth(Ui) <sup>&</sup>lt; depth(Uρ(j)) whenever 0 ≤ i<ρ(j).

Let us consider some <sup>i</sup> <sup>=</sup> <sup>ρ</sup>(n) for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Let us write <sup>V</sup><sup>i</sup> := - j<i U<sup>j</sup> , and H<sup>i</sup> := X \ Vi. The set V<sup>i</sup> is open in Down(Ui) by construction of ρ, hence H<sup>i</sup> is closed in Down(Ui). As E is a topology expander, we derive the following inclusions:

$$\begin{aligned} \mathsf{E}(\mathsf{Domn}(U\_i))|H\_i &\subseteq \mathsf{E}(\mathsf{Domn}(U\_i)|H\_i)|H\_i\\ &\subseteq \mathsf{E}(\mathsf{U}(\mathsf{E}^\alpha(\tau)))|H\_i\end{aligned}$$

Recall that U<sup>i</sup> ∈ E(Down(Ui)) thanks to Lemma 3.27. As a consequence, <sup>U</sup><sup>i</sup> \ <sup>V</sup><sup>i</sup> <sup>=</sup> <sup>W</sup><sup>i</sup> \ <sup>V</sup><sup>i</sup> for some open set <sup>W</sup><sup>i</sup> in <sup>E</sup>(U(E<sup>α</sup>(<sup>τ</sup> ))). Thanks to Lemma 3.29, and preservation of Noetherian topologies through topology expanders, the latter is a Noetherian topology. Therefore, (Wρ(i))<sup>i</sup>∈<sup>N</sup> is a good sequence. This provides an <sup>i</sup> <sup>∈</sup> <sup>N</sup> such that <sup>W</sup>ρ(i) <sup>⊆</sup> - <sup>ρ</sup>(j)<ρ(i) Wρ(j). In particular,

$$\begin{aligned} U\_{\rho(i)} \mid V\_{\rho(i)} = W\_{\rho(i)} \mid V\_{\rho(i)} &\subseteq \bigcup\_{\rho(j) < \rho(i)} W\_{\rho(j)} \mid V\_{\rho(i)} \subseteq \bigcup\_{\rho(j) < \rho(i)} W\_{\rho(j)} \mid V\_{\rho(j)} \\ &\subseteq \bigcup\_{\rho(j) < \rho(i)} U\_{\rho(j)} \mid V\_{\rho(j)} \subseteq \bigcup\_{j < \rho(i)} U\_j = V\_{\rho(i)} \end{aligned}$$

This proves that Uρ(i) ⊆ Vρ(i), i.e. that Uρ(i) ⊆ - j<ρ(i) U<sup>j</sup> . Finally, this contradicts the fact that U is bad.

We have effectively proven that being well-behaved with respect to closed subspaces is enough to consider least fixed points of refinement functions. This behaviour should become clearer in the upcoming sections, where we illustrate how this property can be ensured both in the case of Noetherian spaces and well-quasi-orderings.

## **4 Applications of Topology Expanders**

We now briefly explore topologies that can be proven to be Noetherian using Theorem 3.21. It should not be surprising that both the topological Higman lemma and the topological Kruskal theorem fit in the framework of topology expanders, as both were already proven using a minimal bad sequence argument. However, we will proceed to extend the use of topology expander to spaces for which the original proof did not use a minimal bad sequence argument, and illustrate how they can easily be used to define new Noetherian topologies.

**Finite words and finite trees.** As a first example, we can easily recover the *topological Higman lemma* [12, Theorem 9.7.33] because the subword topology is the least fixed point of E<sup>θ</sup> words, which is a topology expander (see Lemmas 3.10 and 3.18).

It does not require much effort to generalise this proof scheme to the case of the *topological Kruskal theorem* [12, Theorem 9.7.46]. As a shorthand notation, let us write t ∈ UV whenever there exists a subtree t of t whose root is labelled by an element of U and whose list of children belongs to V . Recall that we write u ≤<sup>∗</sup> v when u is a scattered subword of v, and t ≤tree t when t embeds in t as a tree (see page 2). As for the subword topology, the definition is ad-hoc but correctly generalises the tree embedding relation because the tree topology of alex(≤) is the Alexandroff topology of ≤tree, for every ordering ≤ over Σ [12, Exercise 9.7.48].

**Definition 4.1 ([12, Definition 9.7.39]).** *Let* (Σ,θ) *be a topological space. The space* T(Σ) *of finite trees over* Σ *can be endowed with the* tree topology*, the coarsest topology such that* UV *is open whenever* U *is an open set of* Σ*, and* V *is an open set of* T(Σ) <sup>∗</sup> *in its subword topology.*

**Definition 4.2.** *Let* (Σ,θ) *be a topological space. Let* Etree<sup>θ</sup> *be the function that maps a topology* τ *to the topology generated by the sets* ↑<sup>≤</sup>tree UV *, for* U *open in* θ*,* V *open in* T(Σ) <sup>∗</sup> *with the subword topology of* τ *.*

**Lemma 4.3.** *The tree topology is the least fixed point of* Etree<sup>θ</sup>*, which is a topology expander. Hence, the tree topology is Noetherian when* θ *is.*

**Ordinal words.** Let us now demonstrate how Theorem 3.21 can be applied over spaces which are proved to be Noetherian without using a minimal bad sequence argument. For that, let us consider Σ<α the set of words of ordinal length less than α, where α is a fixed ordinal. Since ≤<sup>∗</sup> is in general not a wqo on <sup>Σ</sup><α when <sup>≤</sup> is wqo on <sup>Σ</sup>, this also provides an example of a topological minimal bad sequence argument that has no counterpart in the realm of wqos.

**Definition 4.4 ([15]).** *Let* (Σ,θ) *be a topological space. The* ordinal subword topology *over* Σ<α *is the topology generated by the closed sets* F <β<sup>1</sup> <sup>1</sup> ··· <sup>F</sup> <β<sup>n</sup> <sup>n</sup> *, for* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>F</sup><sup>i</sup> *closed in* <sup>θ</sup>*,* <sup>β</sup><sup>i</sup> < α*, and where* <sup>F</sup> <β *is the set of words of length less than* β *with all of their letters in* F*.*

The ordinal subword topology is Noetherian [15], but the proof is quite technical and relies on the in-depth study of the possible inclusions between the subbasic closed sets. Before defining a suitable topology expander, given an ordinal <sup>β</sup> and a set <sup>U</sup> <sup>⊆</sup> <sup>Σ</sup><α, let us write <sup>w</sup> <sup>∈</sup> βU if and only if <sup>w</sup>>γ <sup>∈</sup> <sup>U</sup> for all <sup>0</sup> <sup>≤</sup> γ<β.

**Definition 4.5.** *Let* (Σ,θ) *be a topological space, and* α *be an ordinal. The function* E<sup>θ</sup> <sup>α</sup>-words *maps a topology* <sup>τ</sup> *to the topology generated by the following sets:* <sup>↑</sup><sup>≤</sup><sup>∗</sup> UV *for* U, V *opens in* <sup>τ</sup> *;* <sup>↑</sup><sup>≤</sup><sup>∗</sup> βU*, for* <sup>U</sup> *open in* <sup>τ</sup> *,* <sup>β</sup> <sup>≤</sup> <sup>α</sup>*;* <sup>↑</sup><sup>≤</sup><sup>∗</sup> <sup>W</sup>*, for* W *open in* θ*.*

**Lemma 4.6.** *Given a Noetherian space* (Σ,θ)*, and an ordinal* α*. The map* Eθ <sup>α</sup>-words *is a topology expander, whose least fixed point contains the ordinal subword topology. Therefore, the ordinal subword topology is Noetherian.*

Remark that Definitions 4.2, 4.5 and 3.9 all follow the same blueprint: new open sets are built as upwards closure for the corresponding quasi-order of the natural constructors associated to the space. We argue that this blueprint mitigates the canonicity issue and the complexity of Definitions 4.1, 4.4 and 3.8.

**Ordinal branching trees.** As an example of a new Noetherian topology derived using Theorem 3.21, we will consider α*-branching trees* T<α(Σ), i.e., the least fixed point of the constructor <sup>X</sup> → **<sup>1</sup>**+<sup>Σ</sup> <sup>×</sup>X<α where <sup>α</sup> is a given ordinal. This example was not known to be Noetherian, and fails to be a well-quasi-order, and illustrates how Theorem 3.21 easily applies on inductively defined spaces.

**Definition 4.7.** *Let* (Σ,θ) *be a Noetherian space. The* ordinal tree topology *over* α*-branching trees is the least fixed point of* E<sup>θ</sup> <sup>α</sup>-trees*, mapping a topology* <sup>τ</sup> *to the topology generated by the sets* <sup>↑</sup><sup>≤</sup>tree <sup>U</sup><sup>V</sup> *, where* <sup>U</sup> <sup>∈</sup> <sup>θ</sup>*,* <sup>V</sup> *is open in* (T<α(Σ))<α *with the ordinal subword topology, and* <sup>U</sup><sup>V</sup> *is the set of trees whose root is labelled by an element of* U *and list of children belongs to* V *.*

**Theorem 4.8.** *The* α*-branching trees endowed with the ordinal tree topology forms a Noetherian space.*

*Proof.* It suffices to prove that E<sup>θ</sup> <sup>α</sup>-trees is a topology expander. It is clear that Eθ <sup>α</sup>-trees is monotone, and a closed set of E<sup>θ</sup> <sup>α</sup>-trees(τ ) is always downwards closed for <sup>≤</sup>tree. As a consequence, if <sup>τ</sup> <sup>⊆</sup> <sup>E</sup><sup>θ</sup> <sup>α</sup>-trees(<sup>τ</sup> ) and <sup>H</sup> is closed in <sup>τ</sup> , <sup>t</sup> <sup>∈</sup> <sup>V</sup> := (↑<sup>≤</sup>tree <sup>U</sup><sup>V</sup> ) <sup>∩</sup> <sup>H</sup> if and only if <sup>t</sup> <sup>∈</sup> <sup>H</sup> and every children of <sup>t</sup> belongs to <sup>H</sup>. Therefore, (↑<sup>≤</sup>tree <sup>U</sup><sup>V</sup> )<sup>∩</sup> <sup>H</sup> = (↑<sup>≤</sup>tree <sup>U</sup><sup>V</sup> <sup>∩</sup> <sup>H</sup><α )<sup>∩</sup> <sup>H</sup>. Notice that <sup>H</sup><α <sup>∩</sup><sup>V</sup> is an open of the ordinal subword topology over <sup>τ</sup> <sup>|</sup>H. As a consequence, <sup>V</sup> <sup>∩</sup> <sup>H</sup> <sup>∈</sup> Eθ <sup>α</sup>-trees(<sup>τ</sup> <sup>|</sup>H)|H.

Let us now check that E<sup>θ</sup> <sup>α</sup>-trees preserves Noetherian topologies. Let W<sup>i</sup> := <sup>↑</sup><sup>≤</sup>tree <sup>U</sup>iV<sup>i</sup> be a <sup>N</sup>-indexed sequence of open sets in <sup>E</sup><sup>θ</sup> <sup>α</sup>-trees(τ ) where τ is Noetherian. The product of the topology θ and the ordinal subword topology over τ is Noetherian thanks to Table 1 and Lemma 4.6. Hence, there exists a <sup>i</sup> <sup>∈</sup> <sup>N</sup> such that <sup>U</sup><sup>i</sup> <sup>×</sup> <sup>V</sup><sup>i</sup> <sup>⊆</sup> - j<i <sup>U</sup><sup>j</sup> <sup>×</sup> <sup>V</sup><sup>j</sup> . As a consequence, <sup>W</sup><sup>i</sup> <sup>⊆</sup> - j<i <sup>W</sup><sup>j</sup> . We have proven that E<sup>θ</sup> <sup>α</sup>-trees(<sup>τ</sup> ) is Noetherian. 

### 470 A. Lopez

At this point, we have proven that the framework of topology expanders allows to build non-trivial Noetherian spaces. We argue that this bears several advantages over ad-hoc proofs: (i) the ad-hoc proofs are often tedious and error prone [12, 13, 15] (ii) the verification that E is a topology expander on the other hand is quite simple (iii) reduces the canonicity issue of topologies to the choice of a suitable topology expander.

## **5 Consequences on inductive definitions**

So far, the process of constructing Noetherian spaces has been the following: first build a set of points, then compute a topology that is Noetherian as a least fixed point. In the case where the set of points itself is inductively defined (such as finite words or finite trees), the second step might seem redundant, and getting rid of it provides a satisfactory answer to the canonicity concerns about Noetherian topologies.

Before studying inductive definition of topological spaces, the notion of least fixed-point in this setting has to be made precise. To that purpose, let us now introduce ome basic notions of category theory. In this paper only three categories will appear, the category Set of sets and functions, the category Top of topological spaces and continuous maps, and the category Ord of quasi-ordered spaces and monotone maps. Using this language, a unary constructor G in the algebra of wqos defines an endofunctor from objects of the category Ord to objects of the category Ord preserving well-quasi-orderings.

Notation 5.1. Recall that in a category <sup>C</sup>, Hom(A, B) is used to denote the collection of morphisms from the object <sup>A</sup> to the object <sup>B</sup> in <sup>C</sup>. Moreover, Aut(A) denotes the set of automorphisms of <sup>A</sup>, i.e., invertible elements of (Hom(A, A), ◦).

In our study of Noetherian spaces (resp. well-quasi-orderings), we will often see constructors G as first building a new set of structures, and then adapting the topology (resp. ordering) to this new set. In categorical terms, we are interested in endofunctors G that are U-lifts of endofunctors on Set, where U is the forgetful functor from Top (resp. Ord) to Set.

#### **5.1 Divisibility Topologies of Analytic Functors**

The goal of this section is to introduce the categorical framework needed to formalise the automatic definition of a topology over an inductively defined datatype, and to compare this definition with the work that exists on wellquasi-orders by Hasegawa [17] and Freund [8]. We will avoid as much as possible the use of complex machinery related to analytic functors, and use as a definition an equivalent characterisation given by Hasegawa [17, Theorem 1.6]. For an introduction to analytic functors and combinatorial species, we redirect the reader to Joyal [20].

Notation 5.2. Given G an endofunctor of Set, the category of elements el(G) has as objects pairs (E, a) with a ∈ G(E), and as morphisms between (E, a) and (E- , a- ) maps f : E → E such that G<sup>f</sup> (a) = a- .

As an intuition to the unfamiliar reader, an element (E, a) in el(G) is a witness that a can be produced through G by using elements of E. Morphisms of elements are witnessing how relations between elements of G(E) and G(E- ) arise from relations between E and E- . As a way to define a "smallest" set of elements E such that a can be found in G(E), we rely on transitive objects. We recall that in a category C, if X, A are two objects, the action of Aut(X) on Hom(X, A) is transitive when for every pair f,g ∈ Hom(X, A), there exists a h ∈ Aut(X) such that f ◦ h = g.

Notation 5.3. A transitive object in a category C is an object X satisfying the following two conditions for every object A of C: (a) the set Hom(X, A) in C is non-empty; (b) the right action of Aut(X) on Hom(X, A) by composition is transitive.

Notation 5.4. Given an object A in a category C, one can build the slice category C/A whose objects are elements of Hom(B,A) when B ranges over objects of C and morphisms between c<sup>1</sup> ∈ Hom(B1, A) and c<sup>2</sup> ∈ Hom(B2, A) are maps f : B<sup>1</sup> → B<sup>2</sup> such that c<sup>2</sup> ◦ f = c1.

This notion of slice category can be combined with the one of transitive object to build so-called "weak normal forms".

Notation 5.5. A weak normal form of an object A in a category C is a transitive object in C/A.

A category C has the weak normal form property whenever every object A has a weak normal form. We are now ready to formulate a definition of analytic functors through the existence of weak normal forms for objects in their category of elements.

Notation 5.6. An endofunctor G of Set is an analytic functor whenever its category of elements el(G) has the weak normal form property. Moreover; X is a finite set for every weak normal form f ∈ Hom((X, x),(Y,y)) in el(G)/(Y,y).

Example 5.7. The functor mapping X to X<sup>∗</sup> is analytic, and the weak normal form of a word (X∗, w) is (letters(w), w) together with the canonical injection from letters(w) to X. In this specific case, the weak normal forms are in fact initial objects.

Example 5.8. The functor mapping <sup>X</sup> to <sup>X</sup><α is not analytic when <sup>α</sup> <sup>≥</sup> <sup>ω</sup>, because of the restriction that weak normal forms are defined using finite sets.

Let us now explain how these weak normal forms can be used to define a support associated to the analytic functor, which in turns allows us to build a notion of substructure ordering over initial algebras of analytic functors.

**Definition 5.9.** Let G be an analytic functor, (X, x) be an element in el(G) and f ∈ Hom((Y,y),(X, x)) be a weak normal form in the slice category el(G)/(X, x). We define f(Y ) as the support of x in X, written suppX(x).

**Definition 5.10.** Let G be an analytic functor and (μG, δ) be an initial algebra of G. We say that a ∈ μG is a child of b ∈ μG whenever a = b or <sup>a</sup> <sup>∈</sup> supp <sup>μ</sup><sup>G</sup>(δ−<sup>1</sup>(b)). The transitive closure of the children relation is called the substructure ordering of μG and written .

Example 5.11. The substructure ordering on μG for G(X) := **1** + Σ × X is the suffix ordering of words.

We leverage the notion of substructure ordering to define a suitable topology expander over initial algebras of analytic functors. Note that this ordering appears implicitely in the construction of Hasegawa [17, Definition 2.7].

**Definition 5.12.** Let G- : Top → Top be a lifting of an analytic functor G, and (μG, δ) an initial algebra of G. We define E<sup>G</sup>- ♦ that maps τ to the topology generated by ↑ δ(U) where U ∈ G- (μG, τ ).

We say that lfp<sup>τ</sup> .E<sup>G</sup>- ♦ is the divisibility topology over μG.

**Theorem 5.13.** Let G- : Top → Top be a lifting of an analytic functor G, and (μG, δ) an initial algebra of G. Moreover, we suppose that G preserves inclusions. The map E<sup>G</sup>- ♦ is a topology expander, hence the divisibility topology is Noetherian.

As a sanity check, we can apply Theorem 5.13 to the sets of finite words and finite trees, and recover the subword topology and the tree topology that were obtained in an ad-hoc fashion in Section 4. In addition to validating the usefulness of Theorem 5.13, we believe that these are strong indicators that the topologies introduced prior to this work were the right generalisations of Higman's word embedding and Kruskal's tree embedding in a topological setting, and addresses the canonicity issue of the aforementioned topologies.

**Lemma 5.14.** The subword topology over Σ∗, (resp. the tree topology over T(Σ)) is the divisibility topology associated to the inductive construction of finite words (resp. finite trees).

## **5.2 Divisibility Preorders**

We are now going to prove that the divisibility topology correctly generalises the corresponding notions on quasi-orderings. In the case of finite words, this translates to the equation alex(≤)<sup>∗</sup> = alex(≤<sup>∗</sup>) [12, Exercise 9.7.30]. We relate the divisibility topology to the divisibility preorder introduced by Hasegawa [17, Definition 2.7].

**Theorem 5.15.** Let G the be the lift of an analytic functor respecting Alexandroff topologies, Noetherian spaces, and embeddings. Then, the divisibility topology of μG is the Alexandroff topology of the divisibility preorder of μG, which is a well-quasi-ordering.

## **6 Outlook**

We have provided a systematic way to place a Noetherian topology over an inductively defined datatype, which is correct with respect to its wqo counterpart whenever it exists. As a byproduct, we obtained a uniform framework that simplifies existing proofs, and serves as an indicator that the pre-existing topologies were the "right generalisations" of their quasi-order counterparts. Let us now briefly highlight some interesting properties of the underlying theory.

**Differences with the existing categorical frameworks.** The existing categorical frameworks are built around a specific kind of functors [17, 8], while the notion of topology expander only requires talking about one specific set. This allows proving that the ordinal subword topology and the α-branching trees are Noetherian, while these escape both the realm of wqos, and of "well-behaved functors" having finite support functions.

**Quasi-analytic functors.** In fact, the proof of Theorem 5.13, never relies on the finiteness of the support of an element. This means that the definition of analytic functors can be loosened to allow non finite weak normal forms. We do not know whether this notion of "quasi-analytic functor" already exists in the literature.

**Transfinite iterations.** As the reader might have noticed, all of the least fixed points considered in this paper are obtained using at most ω steps. This is because the topology expanders that are presented in the paper are all Scottcontinuous, i.e., they satisfy the equation E(sup<sup>i</sup> τi) = sup<sup>i</sup> E(τi). While Theorem 3.21 does apply to non Scott-continuous topology expanders, we do not know any reasonable example of such expander.

**Lack of ordinal invariants.** Even though our proof that the ordinal subword topology is Noetherian is shorter than the original one, it actually provides less information. In particular, it does not provide a bound for ordinal rank of the lattice of closed sets (called the stature of Σ<α), whereas a clear bound is provided by the previous approach Goubault-Larrecq et al. [15, Proposition 33]. This limitation already appears in the existing categorical frameworks [17, 8], and we believe that this is inherent to the use of minimal bad sequence arguments.

**Acknowledgements.** I thank the anonymous reviewers for their helpful suggestions. I thank Jean Goubault-Larrecq and Sylvain Schmitz for their help and support in writing this paper, together with Simon Halfon for his insight on transfinite words.

### 474 A. Lopez

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## An Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic

Quang Loc Le<sup>1</sup>() and Xuan-Bach D. Le<sup>2</sup>

<sup>1</sup> Department of Computer Science, University College London, London, UK loc.le@ucl.ac.uk

<sup>2</sup> School of Computing and Information Systems, University of Melbourne, Melbourne,

Australia

bach.le@unimelb.edu.au

Abstract. An efficient entailment proof system is essential to compositional verification using separation logic. Unfortunately, existing decision procedures are either inexpressive or inefficient. For example, Smallfoot is an efficient procedure but only works with hardwired lists and trees. Other procedures that can support general inductive predicates run exponentially in time as their proof search requires back-tracking to deal with a disjunction in the consequent.

This paper presents a decision procedure to derive cyclic entailment proofs for general inductive predicates in polynomial time. Our procedure is efficient and does not require back-tracking; it uses normalisation rules that help avoid the introduction of disjunction in the consequent. Moreover, our decidable fragment is sufficiently expressive: It is based on compositional predicates and can capture a wide range of data structures, including sorted and nested list segments, skip lists with fast-forward pointers, and binary search trees. We implemented the proposal in a prototype tool, called S2SLin, and evaluated it over challenging problems from a recent separation logic competition. The experimental results confirm the efficiency of the proposed system.

Keywords: Cyclic Proofs, Entailment Procedure, Separation Logic.

## 1 Introduction

Separation logic [20,37] has successfully reasoned about programs manipulating pointer structures. It empowers reusability and scalability through compositional reasoning [6,7]. A compositional verification system relies on bi-abduction technology which is, in turn, based on entailment proof systems. Entailment is defined: Given an antecedent A and a consequent C where A and C are formulas in separation logic, the entailment problem checks whether A |= C is valid. Thus, an efficient decision procedure for entailments is the vital ingredient of an automatic verification system in separation logic.

To enhance the expressiveness of the assertion language, for example, to specify unbounded heaps and interesting pure properties (e.g., sortedness, parent pointers), separation logic is typically combined with user-defined inductive predicates [9,31,35]. In this setting, one key challenge of an entailment procedure is the ability to support induction reasoning over the combination of heaps and data content. The problem of induction is challenging, especially for an automated inductive theorem prover, where the induction rules are not explicitly stated. Indeed, this problem is undecidable [1].

Developing a sound and complete entailment procedure that could be used for compositional reasoning is not trivial. It is unknown how model-based systems, e.g. [14,15,17,18,22,23], could support compositional reasoning. In contrast, there was evidence that proof-based decision procedures, e.g., Smallfoot [2] and the variant [12], and Cycomp [42], can be extended to solve the bi-abduction problem, which enables compositional reasoning and scalability [7,25]. Smallfoot was the centre of the biabductive procedure deployed in Infer [7], which which greatly impacted academia and industry [13]. Furthermore, Smallfoot is very efficient due to its use of the "exclude-the-middle" rule, which can avoid the proof search over the disjunction in the consequent. However, Smallfoot works for hardwired lists and binary trees only. In contrast, Cycomp, a recent complete entailment procedure, is a cyclic proof system without "exclude-themiddle" and can support general inductive predicates but has double exponential time complexity due to the proof search (and back-tracking) in the consequent.

This paper introduces a cyclic proof system with an "exclude-the-middle"-styled decision procedure for decidable yet expressive inductive predicates. We especially show that our procedure runs in polynomial time when the maximum number of fields of data structures is bounded by a constant. The decidable fragment, SHLIDe, contains inductive definitions of compositional predicates and pure properties. These predicates can capture nested list segments, skip lists and trees. The pure properties of small models can model a wide range of common data structures, e.g. a list with fast-forward pointers, sorted nested lists, and binary search trees [22,32]. This fragment is much more expressive than Smallfoot's and is incomparable to Cycomp's [42]: there exist some entailments our system can handle, but Cyccomp could not, and vice versa.

Our procedure is a variant of the cyclic proof system introduced by Brotherston [3,5] and has become one of the leading solutions to induction reasoning in separation logic. Intuitively, a cyclic proof is naturally represented as a tree of statements (entailments in this paper). The leaves are either axioms or nodes linked back to inner nodes; the tree's root is the theorem to be proven, and nodes are connected to one or more children by proof rules. Alternatively, a cyclic proof can be viewed as a tree possibly containing some back-links (a.k.a. cycles, e.g., "C, if B, if C") such that the proof satisfies some global soundness condition. This condition ensures that the proof can be viewed as a proof of *infinite descent*. For instance, for a cyclic entailment proof with inductive definitions, if every cycle contains an unfolding of some inductive predicate, then that predicate is infinitely often reduced into a strictly "smaller" predicate. This infinity is impossible as the semantics of inductive definitions only allows finite steps of unfolding. Hence, that proof path with the cycle can be disregarded.

The proposed system advances Brotherston's system in three ways. First, the proposed proof search algorithm is specialized to SHLIDe, which includes "exclude-themiddle" rules and excludes any back-tracking. The existing proof procedures typically search for proof (and back-track) over disjunctive cases generated from unfolding inductive predicates in the RHS of an entailment. To avoid such costly searches, we propose "exclude-the-middle"-styled normalised rules in which the unfolding of inductive predicates in the RHS always produces one disjunct. Therefore, our system is much more efficient than existing systems. Second, while a standard Brotherston system is incomplete, our proof search is complete in SHLIDe: If it is stuck (i.e., it can not apply any inference rules), then the root entailment is invalid.

Lastly, while the global soundness in [5] must be checked globally and explicitly, every back-link generated in SHLIDe is sound by design. We note that Cycomp, introduced in [42], was the first work to show the completeness of a cyclic proof system. However, in contrast to ours, it did not discuss the global soundness condition, which is the crucial idea attributing to the soundness of cyclic proofs.

*Contributions* Our primary contributions are summarized as follows.


*Organization* The remainder of the paper is organised as follows. Sect. 2 describes the syntax of formulas in fragment SHLIDe. Sect. 3 presents the basics of an "excludethe-middle" proof system and cyclic proofs. Sect. 4 elaborates on the result, the novel cyclic proof system, including an illustrative example. Sect. 5 discusses soundness and completeness. Sect. 6 presents the implementation and evaluation. Sect. 7 discusses related work. Finally, Sect. 8 concludes the work.

## 2 Decidable Fragment SHLIDe

Subsection 2.1 presents syntax of separation logic formulae and recursive definitions of linear predicates and local properties. Subsection 2.2 shows semantics.

### 2.1 Separation Logic Formulas

Concrete heap models assume a fixed finite collection of data structures *Node*, a fixed finite collection of field names *Fields*, a set *Loc* of locations (heap addresses), a set of non-addressable values *Val*, with the requirement that *Val*∩*Loc*=∅ (i.e., no pointer arithmetic). null is a special element of *Val*. <sup>Z</sup> denotes the set of integers (Z⊆*Val*) and k denotes integer numbers. *Var* an infinite set of variables, v¯ a sequence of variables.

*Syntax* Disjunctive formula Φ, symbolic heaps Δ, spatial formula κ, pure formula π, pointer (dis)equality φ, and (in)equality formula α are as follows.

$$\begin{array}{llll} \Phi ::= \Delta \mid \Phi \vee \Phi & \Delta ::= \kappa \wedge \pi \mid \exists v. \,\,\kappa \wedge \pi & \pi :: \coloneqq \mathtt{true} \mid \alpha \mid \neg \pi \mid \pi \wedge \pi \\\kappa ::::= \mathtt{emp} \mid x \mapsto c(f \colon v, \ldots, f \colon v) \mid \mathsf{P}(\bar{v}) \mid \kappa \* \kappa & \qquad \alpha ::= a = a \mid a \leq a & \, a ::= k \mid v \end{array}$$

where <sup>v</sup>∈*Var*, c∈*Node* and f∈*Fields*. Note that we often discard field names f of pointsto predicates <sup>x</sup>→c(f:v, .., f:v) and use the short form as <sup>x</sup>→c(¯v). <sup>v</sup><sup>1</sup> <sup>=</sup>v<sup>2</sup> is the short form of <sup>¬</sup>(v<sup>1</sup>=v<sup>2</sup>). <sup>E</sup> denotes for either a variable or null. <sup>Δ</sup>[E/v] denotes the formula obtained from Δ by substituting v by E. *A symbolic heap is referred as a base, denoted as* Δ<sup>b</sup>*, if it does not contain any occurrence of inductive predicates.*

#### 480 Q. L. Le et al.

*Inductive Definitions* We write <sup>P</sup> to denote a set of n defined predicates P={P1, ..., Pn} in our system. Each inductive predicate has following types of parameters: a pair of root and segment defining segment-based linked points-to heaps, reference parameters (e.g., parent pointers, fast-forwarding pointers), transitivity parameters (e.g., singly-linked lists where every heap cell contains the same value a) and pairs of ordering parameters (e.g., trees being binary search trees). An inductive predicate is defined as

$$\begin{array}{l} \mathsf{pred} \, \mathsf{P}(r, F, \bar{B}, u, sc; tg) \equiv \mathsf{emp} \wedge r = F \wedge sc = tg \\ \lor \, \exists X\_{tl}, \bar{Z}, sc'. r \mapsto c(X\_{tl}, \bar{p}, u, sc') \ast \kappa' \ast \mathsf{P}(X\_{tl}, F, \bar{B}, u, sc', tg) \land r \neq F \wedge sc \diamond sc' = tg \end{array}$$

where r is the root, F the segment, B¯ the borders, u the parameter for a transitivity property, sc and tg source and target, respectively, parameters of an order property, r→c(Xtl,p,u,sc ¯ - ) ∗ κ the matrix of the heaps, and ∈{=, ≥, ≤}. (The extension for multiple local properties is straightforward.) Moreover, this definition is constrained by the following three conditions on heap connectivity, establishment, and termination.

Condition C1. In the recursive rule, <sup>p</sup>¯ <sup>=</sup> {null}∪Z¯. This condition implies that If two variables points to the same heap, their content must be the same. For instance, the following definition of singly-linked lists of even length does not satisfy this condition.

$$\mathtt{preded \mathtt{e11}}(r, F) \equiv \mathtt{emp} \land r = F \lor \exists x\_1, X.r \mapsto c\_1(x\_1) \ast x\_1 \mapsto c\_1(X) \ast \mathtt{e11}(X, F) \land r \neq F$$

as n<sup>3</sup> and X are not field variables of the node pointed-to by r.

Condition C2. The matrix heap defines nested and connected list segments as:

$$\kappa' \coloneqq \mathbb{Q}(Z, \bar{U}) \mid \kappa' \ast \kappa' \mid \mathsf{emp}.$$

where <sup>Z</sup>∈p¯ and (U¯ \ <sup>p</sup>¯) <sup>∩</sup> <sup>Z</sup> <sup>=</sup> <sup>∅</sup>. This condition ensures connectivity (i.e. all allocated heaps are connected to the root) and establishment (i.e. every existential quantifier either is allocated or equals to a parameter).

Condition C3. There is no mutual recursion. We define an order ≺<sup>P</sup> on inductive predicates as: P ≺<sup>P</sup> Q if at least one occurrence of predicate <sup>Q</sup> appears in the definition of <sup>P</sup> and <sup>Q</sup> is called a direct sub-term of <sup>P</sup>. We use ≺<sup>∗</sup> <sup>P</sup> to denote the transitive closure of <sup>≺</sup><sup>P</sup> .

Several definition examples are shown as follows.

$$\begin{array}{l} \mathsf{pred}\,\mathsf{11}(r, F) \equiv \mathsf{emp} \land r = F \lor \exists X\_{tl}. r \mapsto c\_1(X\_{tl}) \ast \mathsf{11}(X\_{tl}, F) \land r \neq F \quad \mathsf{pred}\,\mathsf{n11}(r, F, B) \equiv \mathsf{genp} \land r = F \\\ \mathtt{pred}\,\mathsf{n11}(r, F, B) \equiv \mathsf{emp} \land r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n11}(r, F, B) \mid r = F \quad \mathsf{n$$

∨ ∃Xtl,Z.r→c3(Xtl,Z)∗ll(Z, B)∗nll(Xtl,F ,B)∧r =F pred skl1(r,F) ≡ emp∧r=F ∨ ∃Xtl.r→c4(Xtl,null,null)∗skl1(Xtl, F)∧r =F pred skl2(r,F) ≡ emp∧r=F

∨ ∃Xtl, Z1.r→c4(Z1,Xtl,null)∗skl1(Z1,Xtl)∗skl2(Xtl, F)∧r =F pred skl3(r,F) ≡ emp∧r=F

$$\begin{array}{c} \vee \exists X\_{tl}, Z\_1, Z\_2. r \mapsto c\_4(Z\_1, Z\_2, X\_{tl}) \ast \mathbf{sk1} \, \mathbf{1}(Z\_1, Z\_2) \ast \mathbf{sk1} \, \mathbf{2}(Z\_2, X\_{tl}) \ast \mathbf{sk1} \, \mathbf{3}(X\_{tl}, F) \wedge r \neq F \\\ \mathbf{pred} \, \mathbf{tree}(r, B) \equiv \mathbf{send} \, \wedge \, r = B \\\ \ldots \quad \ldots \quad \ldots \quad \ldots \end{array}$$

$$\lor \quad \exists r\_l, r\_r. r \mapsto c\_t(r\_l, r\_r) \ast \texttt{true} \, \mathsf{e}(r\_l, B) \ast \texttt{true} \, \mathsf{e}(r\_r, B) \land r \neq B$$

ll defines singly-linked lists, nll defines lists of acyclic lists, slk1, slk2 and slk3 define skip-lists. Finally, tree defines binary trees. We extend predicate ll with transiAn Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic 481

tivity and order parameters to obtain predicate lla and lls, respectively, as follows.

$$\begin{array}{l} \mathsf{pred}\,\mathsf{11a}(r, F, a) \equiv \mathsf{emp} \land r = F \lor \exists X\_{tl}. r \mapsto c\_{2}(X\_{tl}, a) \,\*\, \mathsf{11a}(X\_{tl}, F, a) \land r \neq F \\\ \mathtt{pred}\,\mathsf{11s}(r, F, mi, ma) \equiv \mathsf{emp} \land r = F \land ma = mi \\\ \forall \, \exists X\_{tl}. mi\_{1}. r \mapsto c\_{4}(X\_{tl}, mi\_{1}) \,\*\, \mathsf{11s}(X\_{tl}, F, mi\_{1}, ma) \land r \neq F \land mi \leq mi\_{1} \end{array}$$

*Unfolding* Given pred P(t ¯) <sup>≡</sup> Φ and a formula <sup>P</sup>(¯v)∗Δ, then unfolding <sup>P</sup>(¯v) means replacing <sup>P</sup>(¯v) by Φ[¯v/t ¯]. We annotate a number, called unfolding number, for each occurrence of inductive predicates. Suppose <sup>∃</sup>w.r ¯ →c(¯p) <sup>∗</sup> <sup>Q</sup>1(¯v<sup>1</sup>)∗...∗Qm(¯v<sup>m</sup>) <sup>∗</sup> <sup>P</sup>(¯v<sup>0</sup>)∧<sup>π</sup> be the recursive rule, then in the unfolded formula, if <sup>P</sup>(¯v<sup>0</sup>[¯v/t ¯])<sup>k</sup><sup>1</sup> and <sup>Q</sup>i(...)<sup>k</sup><sup>2</sup> are direct sub-terms of <sup>P</sup>(¯v)<sup>k</sup> like above, then <sup>k</sup><sup>1</sup>=k+1 and <sup>k</sup><sup>2</sup> = 0. When it is unambiguous, we discard the annotation of the unfolding number for simplicity.

### 2.2 Semantics

The program state is interpreted by a pair (s,h) where s∈*Stacks*, h∈*Heaps* and stack *Stacks* and heap *Heaps* are defined as:

$$\begin{array}{l} \textit{Heaps} \stackrel{\scriptstyle \text{def}}{=} Loc \rightharpoonup\_{fin} (Node \to (Fields \to Val \cup Loc)^m) \\ \textit{Sstack{stack}} \stackrel{\scriptstyle \text{def}}{=} Val \cup Loc \end{array}$$

Note that we assume that every data structure contains at most m fields. Given a formula Φ, its semantics is given by a relation: s,h <sup>|</sup><sup>=</sup> Φ in which the stack s and the heap h satisfy the constraint Φ. The semantics is shown below

s, h <sup>|</sup><sup>=</sup> emp iff *dom*(h)=<sup>∅</sup> s, h <sup>|</sup><sup>=</sup> <sup>v</sup>→c(f<sup>i</sup> : <sup>v</sup><sup>i</sup>) iff *dom*(h)={s(v)}, h(s(v))=g, g(c, f<sup>i</sup>)=s(v<sup>i</sup>) s, h <sup>|</sup><sup>=</sup> <sup>P</sup>(¯v) iff (h, s(¯v<sup>1</sup>), .., s(¯v<sup>k</sup>)) <sup>∈</sup> -P s, h <sup>|</sup><sup>=</sup> <sup>κ</sup><sup>1</sup> <sup>∗</sup> <sup>κ</sup><sup>2</sup> iff <sup>∃</sup>h<sup>1</sup>, h<sup>2</sup> s.t h<sup>1</sup>#h<sup>2</sup>, <sup>h</sup>=h<sup>1</sup>·h<sup>2</sup>, , s, h<sup>1</sup> <sup>|</sup><sup>=</sup> <sup>κ</sup><sup>1</sup> and s, h<sup>2</sup> <sup>|</sup><sup>=</sup> <sup>κ</sup><sup>2</sup> s, h <sup>|</sup><sup>=</sup> true iff always s, h <sup>|</sup><sup>=</sup> κ∧π iff s, h <sup>|</sup><sup>=</sup> κ and s <sup>|</sup><sup>=</sup> π s, h <sup>|</sup><sup>=</sup> <sup>∃</sup>v.Δ iff <sup>∃</sup>α.s[v→α], h <sup>|</sup><sup>=</sup> Δ s, h <sup>|</sup><sup>=</sup> <sup>Φ</sup><sup>1</sup> <sup>∨</sup> <sup>Φ</sup><sup>2</sup> iff s, h <sup>|</sup><sup>=</sup> <sup>Φ</sup><sup>1</sup> or s, h <sup>|</sup><sup>=</sup> <sup>Φ</sup><sup>2</sup>

dom(g) is the domain of <sup>g</sup>, <sup>h</sup><sup>1</sup>#h<sup>2</sup> denotes disjoint heaps <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> i.e., *dom*(h<sup>1</sup>)<sup>∩</sup> *dom*(h<sup>2</sup>)=∅, and <sup>h</sup><sup>1</sup>·h<sup>2</sup> denotes the union of two disjoint heaps. If <sup>s</sup> is a stack, <sup>v</sup>∈*Var*, and α∈*Val*∪*Loc*, we write s[v→α] = s if v∈*dom*(s), otherwise s[v→α] = s∪{(v, α)}. Semantics of non-heap (pure) formulas is omitted for simplicity. The interpretation of an inductive predicate <sup>P</sup>(t ¯) is based on the least fixed point semantics -P.

Entailment Δ <sup>|</sup><sup>=</sup> Δ holds iff for all s and h, if s, h <sup>|</sup><sup>=</sup> Δ then s, h <sup>|</sup><sup>=</sup> Δ- .

## 3 Entailment Problem & Overview

Throughout this work, we consider the following problem.

```
PROBLEM: QF ENT−SLLIN.
INPUT: Δa ≡ κa∧πa and Δc ≡ κc∧πc where FV(Δc) ⊆ FV(Δa) ∪ {null}.
QUESTION: Does Δa |= Δc hold?
```
An entailment, denoted as e, is syntactically formalized as: Δ<sup>a</sup> - Δ<sup>c</sup> where Δ<sup>a</sup> and Δ<sup>c</sup> are quantifier-free formulas whose syntax are defined in the preceding section.

In Sect. 3.1, we present the basis of an exclude-the-middle proof system and our approach to QF ENT−SLLIN. In Sect. 3.2, we describe the foundation of cyclic proofs.

## 3.1 Exclude-the-Middle Proof System

Given a goal Δ<sup>a</sup> - Δc, an entailment proof system might derive entailments with a disjunction in the right-hand side (RHS). Such an entailment can be obtained by a proof rule that replaces an inductive predicate by its definition rules. Authors of Smallfoot [2] introduced a normal form and proof rules to prevent such entailments when the predicate are lists or trees. Smallfoot considers the following two scenarios.


In doing so, Smallfoot does not introduce a disjunction in the RHS. However, as it uses specific lemmas in the induction reasoning, it only works for the hardwired lists.

This paper proposes S2SLin as an exclude-the-middle system for user-defined predicates, those in SHLIDe. Instead of using hardwired lemmas, we apply cyclic proofs for induction reasoning. For instance, to discharge the entailment e<sup>2</sup> above, S2SLin first unfolds ll(x, z) in the LHS and obtains two premises:

$$\begin{array}{l} \mathsf{-} \ \mathsf{e}\_{21} : (\mathsf{emp} \wedge x = z) \ast \Delta \vdash \mathsf{11}(x, \mathsf{nu11}) \ast \Delta'; \mathsf{and} \\\mathsf{-} \ \mathsf{e}\_{22} : (x \mapsto c(y) \ast \mathsf{11}(y, z) \wedge x \neq z) \ast \Delta \vdash \mathsf{11}(x, \mathsf{nu11}) \ast \Delta' \end{array}$$

While it reduces e<sup>21</sup> to Δ[z/x] ll(z, null) ∗ Δ- [z/x], for e22, it further applies the frame rule as in Case 1 above and obtains ll(y, z) <sup>∗</sup> <sup>Δ</sup> <sup>∧</sup> <sup>x</sup> <sup>=</sup> <sup>z</sup> ll(y, null) ∗ Δ- . Then, it makes a backlink between the latter and e<sup>2</sup> and closes this path. Doing so does not introduce disjunctions in the RHS and can handle user-defined predicates.

### 3.2 Cyclic Proofs

Central to our work is a procedure that constructs a cyclic proof for an entailment. Given an entailment Δ - Δ- , if our system can derive a cyclic proof, then Δ <sup>|</sup><sup>=</sup> Δ- . If instead, it is stuck without proof, then Δ <sup>|</sup><sup>=</sup> Δis not valid.

The procedure includes proof rules, each of which is of the form:

$$\mathsf{PR}\_0 \xrightarrow[]{\mathsf{PR}\_1} \frac{\begin{array}{ccc} \mathsf{\reflectbox{ $\mathsf{\reflectbox{}}$ } & \dots & \mathsf{\reflectbox{}}\_n \\ \hline \mathsf{\reflectbox{}} & \blacksquare & \blacksquare \end{array}}{\mathsf{\reflectbox{}}} \mathsf{\reflectbox{}} \mathsf{\reflectbox{}} \end{array} \mathbf{\reflectbox{}} $$

where entailment e (called the conclusion) is reduced to entailments e1, ..,e<sup>n</sup> (called the premises) through inference rule PR<sup>0</sup> given that the *side condition* cond holds.

A cyclic proof is a proof tree <sup>T</sup><sup>i</sup> which is a tuple (V,E, <sup>C</sup>) where


A leaf node is marked as closed if it is evaluated as valid (i.e. the node is applied with an axiom), invalid (i.e. no rule can apply), or linked back. Otherwise, it is marked as open. A proof tree is *invalid* if it contains at least one invalid leaf node. It is *pre-proof* if all its leaf nodes are either valid or linked back. Furthermore, a pre-proof is a cyclic proof if a global soundness condition is established in the tree. Intuitively, this condition requires that for every <sup>C</sup>(ec→e<sup>b</sup>, σ), there exist inductive predicates <sup>P</sup>(t ¯ <sup>1</sup>) in <sup>e</sup><sup>c</sup> and <sup>Q</sup>(<sup>t</sup> ¯ <sup>2</sup>) in e<sup>b</sup> such that <sup>Q</sup>(t ¯ <sup>2</sup>) is a subterm of <sup>P</sup>(t ¯ <sup>1</sup>).

Definition 1 (Trace) *Let* <sup>T</sup><sup>i</sup> *be a pre-proof of* <sup>Δ</sup><sup>a</sup> - <sup>Δ</sup><sup>c</sup> *and* (Δ<sup>a</sup><sup>i</sup> - <sup>Δ</sup><sup>c</sup><sup>i</sup> )<sup>i</sup>≥<sup>0</sup> *be a path of* <sup>T</sup>i*. A trace following* (Δ<sup>a</sup>i-<sup>Δ</sup><sup>c</sup><sup>i</sup> )<sup>i</sup>≥<sup>0</sup> *is a sequence* (α<sup>i</sup>)<sup>i</sup>≥<sup>0</sup> *such that each* <sup>α</sup><sup>i</sup> *(for all* <sup>i</sup>≥0*) is a subformula of* <sup>Δ</sup><sup>a</sup><sup>i</sup> *containing predicate* <sup>P</sup>(<sup>t</sup> ¯)<sup>u</sup>*, and either:*


Definition 2 (Cyclic proof) *A pre-proof* <sup>T</sup><sup>i</sup> *of* <sup>Δ</sup><sup>a</sup> - <sup>Δ</sup><sup>c</sup> *is a cyclic proof if, for every infinite path* (Δ<sup>a</sup>i-<sup>Δ</sup><sup>c</sup><sup>i</sup> )<sup>i</sup>≥<sup>0</sup> *of* <sup>T</sup>i*, there is a tail of the path* <sup>p</sup>=(Δ<sup>a</sup><sup>i</sup> - <sup>Δ</sup><sup>c</sup><sup>i</sup> )<sup>i</sup>≥<sup>n</sup> *such that there is a trace following* p *which has infinitely progressing points.*

Suppose that all proof rules are (locally) sound (i.e., if the premises are valid, then the conclusion is valid). The following Theorem shows *global soundness*.

Theorem 1 (Soundness [5]). *If there is a cyclic proof of* <sup>Δ</sup><sup>a</sup> -<sup>Δ</sup><sup>c</sup>*, then* <sup>Δ</sup><sup>a</sup> <sup>|</sup><sup>=</sup> <sup>Δ</sup><sup>c</sup>*.*

The proof is by contraction (c.f. [5]). Intuitively, if we can derive a cyclic proof for <sup>Δ</sup><sup>a</sup> - <sup>Δ</sup><sup>c</sup> and <sup>Δ</sup><sup>a</sup> |<sup>=</sup> <sup>Δ</sup><sup>c</sup>, then the inductive predicates at the progress points are unfolded infinitely often. This infinity contradicts the least semantics of the predicates.

## 4 Cyclic Entailment Procedure

This section presents our main proposal, the entailment procedure ω-ENT with the proposed inference rules (subsection 4.1), and an illustrative example (subsection 4.2).

## 4.1 Proof Search

The proof search algorithm ω-ENT is presented in Fig. 1. ω-ENT takes e<sup>0</sup> as input, produces cyclic proofs, and based on that, decides whether the input is valid or invalid. The idea of ω-ENT is to iteratively reduce T<sup>0</sup> into a sequence of cyclic proof trees <sup>T</sup>i, <sup>i</sup> <sup>≥</sup> <sup>0</sup>. Initially, for every P(¯v)<sup>k</sup> ∈ e0, k is reset to 0, and T<sup>0</sup> only has e<sup>0</sup> as an open leaf, the root. On line 3, through the procedure is closed(Ti ), <sup>ω</sup>-ENT chooses an *open* leaf node <sup>e</sup>i, and a proof

Fig. 1: Proof tree construction procedure

rule P Ri to apply. If is closed(Ti ) returns valid (that is, every leaf is applied to an axiom rule or involved in a back-link), ω-ENT returns valid on line 4. If it returns invalid, then <sup>ω</sup>-ENT returns invalid (one line 5). Otherwise, it tries to link <sup>e</sup>i back to an internal node (on line 6). If this attempt fails, it applies the rule (line 7).

Note that at each leaf, is closed attempts rules in the following order: normalization rules, axiom rules, and reduction rules. A rule P Ri is chosen if its conclusion can be unified with the leaf through some substitution σ. Then, on line 7, for each premise of P Ri, procedure apply creates a new open node and connects the node to <sup>e</sup>i via a new edge. If P Ri is an axiom, procedure apply marks <sup>e</sup>i as closed and returns.

*Procedure* is closed(Ti ) This procedure examines the following three cases.

	- (a) <sup>e</sup>i could not be applied by any inference rule.
	- (b) there exists a predicate op1(E) ∈ Δ such that op2(E) ∈/ Δ and one of the following conditions holds:
		- either P(E- ,E,...) or E- →c(E,..) are on both sides
		- both P(E- ,E,...) ∈ Δ and E- →c(E,..) ∈ Δ
	- (c) there exists a predicate op1(E)∈Δsuch that G(op1(E))∈Δ and op2(E)∈/Δ.
	- (d) there exist x→c1(¯v1) ∈ Δ, x→c2(¯v2) ∈ Δsuch that c<sup>1</sup> ≡ c<sup>2</sup> or v¯1≡v¯2.

In the rest, we discuss the proof rules and the auxiliary procedures in detail.

*Normalisation* An entailment is in the normal form (NF) if its LHS is in NF. We write op(E) to denote for either <sup>E</sup>-<sup>→</sup>c(¯v) or <sup>P</sup>(E,F,B, ¯ <sup>v</sup>¯). Furthermore, the guard <sup>G</sup>(op(E)) is defined by: <sup>G</sup>(E-<sup>→</sup>c(¯v)) def = true and G(P(E,F,B, ¯ v¯)) def <sup>=</sup> <sup>E</sup>=F.

Definition 3 (Normal Form) *A formula* <sup>κ</sup>∧φ∧<sup>a</sup> *is in normal form if:*


If <sup>Δ</sup> is in NF and for any s, h <sup>|</sup><sup>=</sup> <sup>Δ</sup>, then *dom*(h) is uniquely defined by <sup>s</sup>.

The normalisation rules are presented in Fig. 2. Basically, ω-ENT applies these rules to a leaf exhaustively and transforms it into NF before others. Given an inductive predicate P(E, F, ...), rule ExM excludes the middle by doing case analysis for the predicate between base-case (i.e., <sup>E</sup>=F) and recursive-case (i.e., <sup>E</sup>=F). The normalisation rule =null follows the following facts: <sup>E</sup>-<sup>→</sup>c( ) <sup>⇒</sup> <sup>E</sup>=null and <sup>P</sup>(E,F, )∧E=<sup>F</sup> <sup>⇒</sup> <sup>E</sup>=null. Similarly, rule =<sup>∗</sup> follows the following facts: <sup>x</sup>-<sup>→</sup> <sup>∗</sup>P(y,F , )∧y=<sup>F</sup> <sup>⇒</sup> <sup>x</sup>=y, <sup>x</sup>-<sup>→</sup> <sup>∗</sup>y-<sup>→</sup> <sup>⇒</sup> <sup>x</sup>=y, and <sup>P</sup>i(x,F1, )∗Pj(y,F2, )∧x=F1∧y=F<sup>2</sup> <sup>⇒</sup> <sup>x</sup>=y.

*Axiom and Reduction* Axiom rules include Emp, Inconsistency and Id, presented in Fig. 3. If each of these rules is applied to a leaf node, the node is evaluated as valid and marked as closed. The remaining ones in Fig. 3 are reduction rules.

For simplicity, the unfoldings in rules Frame, RInd, and LInd are applied with the following definition of inductive predicates:

$$\begin{array}{l} \mathsf{P}(x, F, \bar{B}, u, sc, tg) \equiv \mathsf{emp} \wedge x = F \wedge sc = tg \\ \lor \exists X, sc', d\_1, d\_2. x \mapsto c(X, d\_1, d\_2, u, sc) \ast \mathsf{Q}\_1(d\_1, B) \ast \mathsf{Q}\_2(d\_2, X) \ast \mathsf{P}(X, F, \bar{B}, u, sc', tg) \wedge \pi\_0 \end{array}$$

where <sup>B</sup>∈B¯, the matrix <sup>κ</sup> contains two nested predicates Q<sup>1</sup> and Q2, and the heap cell <sup>c</sup> <sup>∈</sup> *Node* is defined as data <sup>c</sup>{c next; <sup>c</sup><sup>1</sup> down1; <sup>c</sup><sup>2</sup> down2; <sup>τ</sup><sup>s</sup> scdata; <sup>τ</sup><sup>u</sup> udata} where <sup>c</sup>1, c2∈*Node*, down<sup>1</sup> and down<sup>2</sup> fields are for the nested predicates in the matrix

$$\begin{array}{c} \textbf{Subst} \quad \frac{\Delta[E/x] \vdash \Delta'[E/x]}{\Delta \land x = E \vdash \Delta'} \quad \frac{\begin{array}{c} \Delta \land E\_{1} = E\_{2} \vdash \Delta'\\ \Delta \land E\_{1} \neq E\_{2} \vdash \Delta'\\ \Delta \vdash \Delta' \end{array}}{\Delta \vdash \Delta'} \quad \frac{E\_{1} = E\_{2}, E\_{1} \neq E\_{2} \notin \pi}{\mathrm{PV}(E\_{1}, E\_{2}) \subseteq \left(\mathrm{PV}(\Delta) \cup \mathrm{PV}(\Delta')\right)^{S}} \\\\ \mathtt{=} \quad \frac{\begin{array}{c} \Delta \vdash \Delta'\\ \Delta \land E = E \vdash \Delta' \end{array}}{\Delta \land E = E \vdash \Delta'} \quad \mathtt{L} \quad \mathtt{Base} \quad \frac{\left(\kappa \land \pi\right)[tg/sc] \vdash \Delta'[tg/sc]}{\mathtt{P}(E, E, \bar{B}, u, sc, cg) \star \kappa \land \pi \vdash \Delta'} \\\\ \xleftarrow{op}(E) \mathrel{\mathtt{st} \land \pi} \land \pi \land G(op(E)) \land E \neq \mathtt{null} \vdash \Delta' \quad \mathit{E} \neq \mathtt{null} \; \xi \pi \\\\ \xleftarrow{\star\star} \quad \frac{op\_{1}(E\_{1}) \ast op\_{2}(E\_{2}) \ast \kappa \land \pi \land E\_{1} \neq E\_{2} \vdash \Delta'}{op\_{1}(E\_{1}) \ast op\_{2}(E\_{2}) \star \kappa \land \pi \vdash \Delta'} \quad \mathit{E}\_{1} \neq E\_{2} \notin \pi \text{ and } G(op\_{1}(E\_{1})), G(op\_{2}(E\_{2})) \in \pi \end{array}$$

Fig. 2: Normalization rules

$$\begin{aligned} & \mathbf{Id} \xrightarrow{} \mathbf{Id} \xrightarrow{} \pi \wedge \pi \vdash \Delta \qquad \mathbf{Em} \mathbf{p} \xrightarrow{} \mathbf{Id} \xrightarrow{} \mathbf{Imon} \mathbf{s} \wedge \pi \mathbf{m} \mathbf{e} \end{aligned} \quad \begin{aligned} & \mathbf{Imon} \wedge \pi \vdash \mathbf{sem} \mathbf{dim} \, \mathbf{s} \wedge \pi \mathbf{m} \mathbf{e} \end{aligned} \quad \begin{aligned} & \mathbf{Imonis} \wedge \mathbf{s} \wedge \pi \vdash \Delta \qquad \pi \vdash \mathbf{f} \mathbf{alse} \end{aligned} \\ = & \mathbf{R} \wedge \Delta \land \Delta \land \pi \vdash E \quad \mathbf{Hynois} \wedge \pi \vdash \Delta \land \pi \vdash \Delta \land \pi \vdash \pi \, \pi \quad \mathbf{Rasse} \quad \frac{\Delta \vdash \Delta' \land \pi \vdash \mathbf{g} \, \pi \vdash \Delta} \\ & \star \frac{\kappa\_{1} \wedge \pi \vdash \kappa\_{2} \quad \kappa \land \pi \vdash \kappa' \land \pi' \vdash \pi' \, \pi \, \pi \vdash \pi \, \pi \, \pi \vdash \pi \, \pi \, \pi \, \pi \vdash \pi \, \pi \, \pi \, \pi \vdash \pi \, \pi \, \pi \, \pi \vdash \pi \, \pi \, \pi \, \pi \vdash \pi \, \pi \, \pi \, \pi \, \pi \vdash \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \, \pi \,$$

Fig. 3: Reduction rules (where -: <sup>P</sup>(x,F ,B,u,sc,tg ¯ )-∈κ2, †: x→c(X,E1,E2,u,sc )-∈κ2)

heaps, the udata field is for the transitivity data, and the scdata field is for ordering data. The rules for the general form of the matrix heaps κ are presented in [28].

<sup>=</sup><sup>R</sup> and Hypothesis eliminate pure constraints in the RHS. In rule <sup>∗</sup>, roots(κ) is defined inductively as: roots(emp)≡{}, roots(r→ )≡{r}, roots(P(r, F, ..))≡{r} and roots(κ1∗κ2) ≡ roots(κ1)∪roots(κ2). This rule is applied in three ways. First, it is applied into an entailment which is of the form κ∧π κ∧π . It matches and discards the identified heap predicates between the two sides to generate a premise with empty heaps. As a result, this premise may be applied with the axiom rule EMP. Secondly, it is applied to an entailment of the form x<sup>i</sup>→ci(¯vi)∗...∗x<sup>n</sup>→cn(¯vn)∧π κ ∧π . For each points-to predicate x<sup>i</sup>→ci(¯vi)∈κ , ω-ENT searches for one points-to predicate x<sup>j</sup> →c<sup>j</sup> (¯v<sup>j</sup> ) in the LHS such that x<sup>j</sup> →c<sup>j</sup> (¯v<sup>j</sup> ) ≡ x<sup>i</sup>→ci(¯vi). Lastly, it is applied into an entailment that is of the form Δ<sup>1</sup> ∗ Δ Δ<sup>2</sup> ∗ Δ where either Δ<sup>1</sup> Δ<sup>2</sup> or Δ Δ could be linked back into an internal node.

In RInd, for each occurrence of inductive predicates P(r,F ,B,u,sc,tg ¯ ) in κ , ω-ENT searches for a points-to predicate r→ . If any of these searches fail, ω-ENT decides the conclusion as invalid. Rule LInd unfolds the inductive predicates in the LHS. Every LHS of entailments in this rule also captures the unfolding numbers for the subterm relationship and generates the progressing point in the cyclic proofs afterwards. These numbers are essential for our system to construct cyclic proofs. This rule is applied in a *depth-first* manner, i.e., if there are more than one occurrences of inductive predicates in the LHS that could be applied by this rule, the one with the greatest unfolding number is chosen. We emphasise that the last five rules still work well when the predicate in the RHS contains only a subset of the local properties wrt. the predicate in the LHS.

*Back-Link Generation* Procedure link back<sup>e</sup> generates a back-link as follows. In a preproof, given a path containing a back-link, say e1, e2, .., e<sup>m</sup> where e<sup>1</sup> is a companion and e<sup>m</sup> a bud, then e<sup>1</sup> is in NF and of the following form:


$$x \mapsto c(X, \bar{p}, u, sc) \* \kappa' \* \mathbb{P}(X, F, \bar{B}, u, sc', tg)^{k+1} \* \kappa \wedge \pi \wedge x \neq F \wedge x \neq \text{nu} \mathbf{11} \wedge \pi\_1 \vdash \mathfrak{q}(x, F\_2, \bar{B}, u, sc; tg\_2) \* \kappa' \wedge \pi'$$

We remark that sc sc-∈ π1, and if k ≥ 1, then sc<sup>i</sup> sc ∈ π

– e3, .., e<sup>m</sup>−<sup>4</sup> are obtained from applications of normalisation rules to normalise the LHS of e<sup>2</sup> due to the presence of κ- . As the roots of inductive predicates in κ are fresh variables, the applications of the normalization rules above do not affect the RHS of e2. That means the RHS of e3, .., and e<sup>m</sup>−<sup>4</sup> are the same as that of e2. As a result, e<sup>m</sup>−<sup>4</sup> is of the form:

$$x \mapsto c(X, \bar{p}, u, sc) \* \kappa\_1^{\prime\prime} \* \mathbb{P}(X, F, \bar{B}, u, sc', tg)^{k+1} \* \kappa \land \pi \land x \neq F \land x \neq \mathtt{null1} \land \pi\_1 \land \pi\_2 \nvdash \mathfrak{q}(x, F\_2, \bar{B}, u, sc, tg\_2) \* \kappa' \land \pi'$$

where κ-- <sup>1</sup> may be emp and π<sup>2</sup> is a conjunction of disequalities coming from ExM. – e<sup>m</sup>−<sup>3</sup> is obtained from the application of ExM over x and F<sup>2</sup> and of the form:

$$\begin{array}{c} x \mapsto c(X, \bar{p}, u, sc) \ast \kappa\_1^{\prime\prime} \ast \mathbb{P}(X, F, \bar{B}, u, sc^{\prime}, tg)^{k+1} \ast \kappa \wedge \pi \wedge x \neq F \wedge x \neq \text{null1} \wedge \pi\_1 \wedge \pi\_2\\ \wedge x \neq F\_2 \vdash \mathbb{Q}(x, F\_2, \bar{B}, u, sc, tg\_2) \ast \kappa^{\prime} \wedge \pi^{\prime} \end{array}$$

(For the case x=F2, the rule ExM is kept applying until either F ≡ F2, that is, two sides are reaching the end of the same heap segment, or it is stuck.)

– e<sup>m</sup>−<sup>2</sup> is obtained from the application of RInd and is of the form:

$$\begin{array}{c} x \mapsto c(X, \bar{p}, u, sc) \ast \kappa\_1^{\prime\prime} \ast \mathsf{P}(X, F, \bar{B}, u, sc', tg)^{k+1} \ast \kappa \wedge \pi \wedge x \neq F \wedge x \neq \mathsf{null1} \wedge \pi\_1 \wedge \pi\_2\\ \wedge x \neq F\_2 \vdash x \mapsto c(X, \bar{p}, u, sc) \ast \kappa\_2^{\prime\prime} \ast \mathsf{Q}(X, F\_2, \bar{B}, u, sc', tg\_2) \ast \kappa^{\prime} \wedge \pi^{\prime} \wedge \pi\_2^{\prime} \end{array}$$

– e<sup>m</sup>−<sup>1</sup> is obtained from the application of the Hypothesis to eliminate π- <sup>2</sup> (otherwise, it is stuck) and is of the form:

$$\begin{array}{c} x \mapsto c(X, \bar{p}, u, sc) \ast \kappa\_1^{\prime\prime} \ast \mathbb{P}(X, F, \bar{B}, u, sc^{\prime}, tg)^{k+1} \ast \kappa \wedge \pi \wedge x \neq F \wedge x \neq \mathtt{null1} \wedge \pi\_1 \wedge \pi\_2 \\\ \wedge x \neq F\_2 \vdash x \mapsto c(X, \bar{p}, u, sc) \ast \kappa\_2^{\prime\prime} \ast \mathbb{Q}(X, F\_2, \bar{B}, u, sc^{\prime}, tg\_2) \ast \kappa' \wedge \pi' \end{array}$$

– e<sup>m</sup> is obtained from the application of ∗ and is of the form:

$$\mathbb{P}(X, F, \bar{B}, u, sc', tg)^{k+1} \* \kappa \wedge \pi \wedge x \neq F \wedge x \neq \text{nu} \mathbf{11} \wedge \pi\_1 \wedge \pi\_2 \wedge x \neq F\_2 \wedge \neg \mathbb{Q}(X, F\_2, \bar{B}, u, sc', tg\_2) \* \kappa' \wedge \pi'$$

When k ≥ 1, it is always possible to link e<sup>m</sup> back to e<sup>1</sup> through the substitution is σ≡[x/X, sc/sc- ] after weakening some pure constraints in its LHS.

Fig. 4: Cyclic Proof of lls(x,null,mi, ma)<sup>0</sup>∧x=null llb(x,null,mi).

## 4.2 Illustrative Example

We illustrate our system through the following example:

$$\mathtt{e}\_{0} \colon \mathtt{11s}(x,\mathtt{nu}\mathtt{11},mi,ma)^{0} \land x \neq \mathtt{nu}\mathtt{11} \vdash \mathtt{11b}(x,\mathtt{nu}\mathtt{11},mi)$$

where the sorted linked-list lls (mi is the minimum value and ma is the maximum value) is defined in Sect. 2.1 and llb define singly-linked lists whose values are greater than or equal to a constant number. Particularly, predicate llb is defined as follows.

$$\begin{array}{l} \mathsf{pred}\,\mathsf{11b}(r, F, b) \equiv \mathsf{emp} \wedge r = F\\ \vee \exists X\_{tl}, d.r \mapsto c\_{4}(X\_{tl}, d) \ast \mathsf{11b}(X\_{tl}, F, b) \wedge r \neq F \wedge b \leq d \end{array}$$

Since the LHS is stronger than the RHS, this entailment is valid. Our system could generate the cyclic proof (shown in Fig. 4) to prove the validity of e0. In the following, we present step-by-step to show how the proof was created. Firstly, e0, which is in NF, is applied with rule LInd to unfold predicate lls(x,null,mi,ma)<sup>0</sup> and obtain e<sup>1</sup> as:

$$\mathfrak{e}\_1 \colon x \mapsto c\_4(X, m') \ast \mathbf{11s}(X, \mathtt{nu} \mathbf{11}, m', ma)^1 \wedge x \neq \mathtt{nu} \mathbf{11} \wedge mi \leq m' \vdash \mathbf{11b}(x, \mathtt{nu} \mathbf{11}, mi)^1$$

We remark that the unfolding number of the recursive predicate lls in the LHS is increased by 1. Next, our system normalizes e<sup>1</sup> by applying rule ExM into X and null to generate two children, e<sup>2</sup> and e3, as follows.

$$\begin{array}{l} \mathsf{e}\_{2} \colon x \mapsto c\_{4}(X, m') \ast \mathbf{11s}(X, \texttt{nu11}, m', ma)^{1} \wedge x \neq \texttt{nu11} \wedge mi \leq m' \wedge X = \texttt{nu11} \\ \vdash \mathbf{11b}(x, \texttt{nu11}, mi) \\ \mathsf{e}\_{3} \colon x \mapsto c\_{4}(X, m') \ast \mathbf{11a}(X, \texttt{nu11}, m', ma)^{1} \wedge x \neq \texttt{nu11} \wedge mi \leq m' \wedge X \neq \texttt{nu11} \\ \vdash \mathbf{11b}(x, \texttt{nu11}, mi) \end{array}$$

For the left child, it applies normalization rules to obtain e<sup>4</sup> (substitute X by null) and then e5, by LBase to unfold lls(null,null,m ,ma)<sup>1</sup> to the base case, as:

$$\mathsf{e}\_4 \colon x \mapsto c\_4(\mathsf{nu}11, m') \ast \mathbf{11s}(\mathsf{nu}11, \mathsf{nu}11, m', ma)^\dagger \wedge x \neq \mathsf{nu}11 \wedge mi \leq m' \vdash \mathbf{11b}(x, \mathsf{nu}11, mi) \mathsf{e}\_5 \colon x \mapsto c\_4(\mathsf{nu}11, \mathsf{nu}) \wedge x \neq \mathsf{nu}11 \wedge mi \leq m \vdash \mathbf{11b}(x, \mathsf{nu}11, mi)$$

e6: x-→c4(null, ma) ∧ x=null ∧ mi≤ma x-→c4(null, ma) ∗ llb(null,null,mi) ∧ mi≤ma e<sup>6</sup>- : x-→c4(null, ma) ∧ x=null ∧ mi≤ma x-→c4(null, ma)∧mi≤ma

After that, as mi≤ma ⇒ mi≤ma, e<sup>6</sup>is applied with Hypothesis to obtain e7.

> e7: x-→c4(null, ma) ∧ x=null ∧ mi≤ma x-→c4(null, ma)

As the LHS of e<sup>7</sup> is in NF and a base formula, it is sound and complete to apply rule ∗ to have <sup>e</sup><sup>8</sup> as emp<sup>∧</sup> <sup>x</sup>=null <sup>∧</sup> mi≤ma emp. By Emp, <sup>e</sup><sup>8</sup> is decided as valid. For the right branch of the proof, e<sup>3</sup> is applied with rule -<sup>=</sup><sup>∗</sup> and then RInd to obtain e9:

$$\begin{array}{c} \mathsf{e}\_{9} \colon x \mapsto c\_{4}(X, m') \ast \mathbf{11s}(X, \texttt{nu11}, m', ma)^{1} \wedge x \neq \texttt{nu11} \wedge mi \leq m' \wedge X \neq \texttt{nu11} \wedge x \neq X \\\ \vdash x \mapsto c\_{4}(X, m') \ast \mathbf{11b}(X, \texttt{nu11}, mi) \wedge mi \leq m' \end{array}$$

Then, e<sup>9</sup> is applied with Hypothesis to eliminate the pure constraint in the RHS:

e10: x-→c4(X,m- )∗lls(X,null,m- ,ma)<sup>1</sup> <sup>∧</sup> <sup>x</sup>=null <sup>∧</sup> mi≤m- ∧ X=null ∧ x=X x-→c4(X,m- )∗llb(X,null,mi)

e<sup>10</sup> is then applied the rule ∗ to obtain e<sup>11</sup> and e<sup>12</sup> as follows.

<sup>e</sup>11: <sup>x</sup>→c4(X,m- ) x→c4(X,m- ) e12: lls(X,null,m- ,ma) <sup>1</sup> <sup>∧</sup> <sup>x</sup>-<sup>=</sup>null <sup>∧</sup> mi≤m- ∧ X-<sup>=</sup>null <sup>∧</sup> <sup>x</sup>-<sup>=</sup><sup>X</sup> llb(X,null,mi)

e<sup>11</sup> is valid by Id. e<sup>12</sup> is successfully linked back to e<sup>0</sup> to form a pre-proof as

$$(\mathtt{11s}(X,\mathtt{nu}\mathtt{11},m',ma)^{\mathtt{1}}\land X\neq\mathtt{nu}\mathtt{11})[x/X,mi/m'] \vdash \mathtt{11b}(X,\mathtt{nu}\mathtt{11},mi)[x/X,mi/m']$$

is identical to e0. Since lls(X,null,m- ,ma)<sup>1</sup> in e<sup>12</sup> is the subterm of lls(x,null,mi,ma)<sup>0</sup> in e0, our system decided that e<sup>0</sup> is valid with the cyclic proof presented in Fig. 4.

## 5 Soundness, Completeness, and Complexity

We describe the soundness, termination, and completeness of ω-ENT. First, we need to show the invariant about the quantifier-free entailments of our system.

Corollary 1. *Every entailment derived from* ω*-*ENT *is quantifier-free.*

The following lemma shows the soundness of the proof rules.

Lemma 1 (Soundness). *For each proof rule, the conclusion is valid if all premises are valid.*

As every backlink generated contains at least one pair of inductive predicate occurrences in a subterm relationship, the global soundness condition holds in our system.

Lemma 2 (Global Soundness). *A pre-proof derived is indeed a cyclic proof.*

490 Q. L. Le et al.

The termination relies on the number of premises/entailments generated by ∗. As the number of inductive symbols and their arities are finite, there is a finite number of equivalence classes of these entailments in which any two entailments in the same class are equivalent under some substitution and linked back together. Therefore, the number of premises generated by the rule ∗ is finite, considering the back-links generation.

## Lemma 3. ω*-*ENT *terminates.*

In the following, we show the complexity analysis. First, we show that every occurrence of inductive predicates in the LHS is unfolded at most two times.

Lemma 4. *Given any entailment* <sup>P</sup>(¯v)<sup>k</sup> <sup>∗</sup> <sup>Δ</sup><sup>a</sup> <sup>Δ</sup>c*,* <sup>0</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>2</sup>*.*

Let n be the maximum number of predicates (both inductive predicates and points-to predicates) among the LHS of the input and the definitions in P, and m be the maximum number of fields of data structures. Then, the complexity is defined as follows.

## Proposition 1 (Complexity). QF ENT−SLLIN *is* <sup>O</sup>(<sup>n</sup> <sup>×</sup> <sup>2</sup><sup>m</sup> <sup>+</sup> <sup>n</sup><sup>3</sup>)*.*

If m is bounded by a constant, the complexity becomes polynomial in time.

Our completeness proofs are shown in two steps. First, we show the proofs for an entailment whose LHS is a base formula. Second, we show the correctness when the LHS contains inductive predicates. In the following, we first define the base formulas of the LHS derived by ω-ENT from occurrences of inductive predicates. Based on that, we define bad models to capture counter-models of invalid entailments.

Definition 4 (SHLIDe Base) *Given* κ*, define* κ *as follows.*

$$\begin{array}{llll}\overline{\mathbb{P}(E,F,\bar{B},u,sc,tg)} \stackrel{def}{=} E \mapsto c(F,E\_1,E\_2,u,tg) \ast \overline{\mathbb{Q}\_1(E\_1,B)} \ast \overline{\mathbb{Q}\_2(E\_2,F)} \wedge \pi\_0 \\\overline{E \mapsto c(\bar{v})} \stackrel{def}{=} E \mapsto c(\bar{v}) \qquad \overline{\overline{\mathfrak{emp}}} \stackrel{def}{=} \mathfrak{emp} \qquad \overline{\kappa\_1 \ast \kappa\_2} \stackrel{def}{=} \overline{\kappa\_1} \ast \overline{\kappa\_2} \end{array}$$

The definition for general predicates with arbitrary matrix heaps is presented in [28]. As <sup>P</sup> does not include mutual recursion (Condition C3), the definition above terminates in a finite number of steps. In a pre-proof, these SHLIDe base formulas of the LHS are obtained once every inductive predicate has been unfolded.

Lemma 5. *If* <sup>κ</sup> <sup>∧</sup> <sup>π</sup> *is in NF, then* <sup>κ</sup> <sup>∧</sup> <sup>π</sup> *is in NF, and* <sup>κ</sup> <sup>∧</sup> <sup>π</sup> <sup>κ</sup> *is valid.*

In other words, κ ∧ π is an under-approximation of κ ∧ π; invalidity of κ ∧ π Δ- implies invalidity of κ ∧ π Δ- .

Definition 5 (Bad Model) *The bad model for* <sup>κ</sup>∧φ∧<sup>a</sup> *in NF is obtained by assigning*


An Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic 491

The following lemma states that the correctness of the procedure is closed for cases 2(b-d).

Lemma 7 (Stuck Invalidity). *Given* <sup>κ</sup>∧<sup>π</sup> <sup>Δ</sup> *in NF, it is* invalid *if the procedure* is closed *returns* invalid *for cases 2(b-d).*

A bad model of the <sup>κ</sup>∧<sup>π</sup> is a counter-model. Cases 2b) and 2c) show that the heaps of bad models are not connected, and thus accordingly to conditions C1 and C2, any model of the LHS could not be a model of the RHS. Case 2d) shows that heaps of the two sides could not be matched. We next show the correctness of Case 2(a) of the procedure is closed, and invalidity is preserved during the proof search in ω-ENT.

Proposition 2 (Invalidity Preservation). *If* ω*-*ENT *is stuck, the input is invalid.*

In other words, if ω-ENT returns invalid, we can construct a bad model.

Theorem 2. QF ENT−SLLIN *is decidable.*

## 6 Implementation and Evaluation

We implement S2SLin using OCaml. This implementation is an instantiation of a general framework for cyclic proofs. We utilize the cyclic proof systems to derive bases for inductive predicates shown in [24] to discharge satisfiability of separation logic formulas. We use the solver presented in [29,31] for those formulas beyond this fragment. We also develop a built-in solver for discharging equalities.

We evaluated S2SLin to show that i) it can discharge problems in SHLIDe effectively; and ii) its performance is compatible with state-of-the-art solvers. The evaluation of S2SLin is provided as a companion artifact [27].

*Experiment settings* We have evaluated S2SLin on entailment problems taken from SL-COMP benchmarks [38], a competition of separation logic solvers. We take 356 problems (out of 983) in two divisions of the competition, *qf shls entl* and *qf shlid entl*, and one new division, *qf shlid2 entl*. All these problems semantically belong to our decidable fragment, and their syntax is written in SMT 2.6 format [39].



Table 1: Experimental results

To evaluate S2SLin's performance, we compared it with the state-of-the-art tools such as CyclistSL [5], Spen [15], Songbird [40], SLS [41] and Harrsh [23]. We omitted Cycomp [42], as these benchmarks are beyond its decidable fragment. Note that CyclistSL, Songbird and SLS are not complete; for non-valid problems, while CyclistSL returns unknown, Songbird and SLS use some heuristic to guess the outcome. For each division, we report the number of correct outputs (invalid, valid) and the time (in minutes and seconds) taken by each tool. Note that we use the status (invalid, valid) annotated with each problem in the SL-COMP benchmark as the ground truth. If the output is the same as the status, we classify it as correct; otherwise, it is marked as incorrect. We also note that in these experiments, we used the competition pre-processing tool [39] to transform the SMT 2.6 format into the corresponding formats of the tools before running them. All experiments were performed on an Intel Core i7-6700 CPU 3.4Gh and 8GB RAM. The CPU timeout is 600 seconds.

*Experiment results* The experimental results are reported in Table 1. In this table, the first column presents the names of the tools. The following three columns show the results of the first division, including the number of correct invalid outputs, the number of correct valid outputs and the taken time (where *m* for minutes and *s* for seconds), respectively. The number between each pair of brackets *(...)* in the third row shows the number of problems in the corresponding column. Similarly, the following two groups of six columns describe the results of the second and third divisions, respectively.

In general, the experimental results show that S2SLin is the one (and only one) that could produce all the correct results. Other solvers either produced wrong results or could discharge a fraction of the experiments. Moreover, S2SLin took a short time for the experiments (8.38 seconds compared to 15.91 seconds for Spen, 324 minutes for Songbird, 635 minutes for Harrsh, 739 minutes for SLS and 2120 minutes for CyclistSL). While SLS returned 14 false negatives, Spen reported 20 false positives. CyclistSL, Songbird and Harrsh did not produce any wrong results. Of 569 tests, CyclistSL could handle 85 tests (15%), Harrsh could handle 215 tests (38%), and Songbird could decide on 235 tests (41.3%). In the total of 223 valid tests, CyclistSL could handle 85 problems (38%), and Songbird could decide 222 problems (99.5%).

Now we examine the results for each division in detail. For *qf shls entl*, Spen returned all correct, Songbird 186, Harrsh 155, and CyclistSL 58. If we set the timeout to 2400 seconds, both Songbird and Harrsh produced all the correct results. Division *qf shlid entl* includes 24 invalid problems and 36 valid problems. While Songbird produced 37 problems correctly, CyclistSL produced 24 correct results. Spen reported 27 correct results and 13 false positives (skl2−vc{01 − 04} skl3−vc01, skl3−vc{03 − 10}). The last division, *qf shlid2 entl*, includes 14 invalid and 13 valid test problems. While Songbird decided only 12 problems correctly, CyclistSL produced 3 correct outcomes. Spen reported 10 correct results. However, it produced 7 false positives (ls−mul−vc{01 − 03}, ls−mul−vc05, nll−mul−vc{01 − 03}). We believe that engineering design and effort play an essential role alongside theory development. Since our experiments provide breakdown results of the two SL-COMP competition divisions, we hope that they provide an initial understanding of the SL-COMP benchmarks and tools. Consequently, this might reduce the effort to prepare experiments over these benchmarks to evaluate new SL solvers. Finally, one might point out that S2SLin performed well because the entailments in the experiments are within its scope. We do not entirely disagree with this argument but would like to emphasize that tools do not always work well on favourable benchmarks. For example, Spen introduced wrong results on *qf shlid entl*, and Harrsh did not handle *qf shlid entl* and *qf shlid2 entl* well, although these problems are in their decidable fragments.

## 7 Related Work

S2SLin is a variant of the cyclic proof systems [3,4,5,26] and [42]. Unlike existing cyclic proof systems, the soundness of S2SLin is local, and the proof search is not backtracking. The work presented in [42] shows the completeness of the cyclic proof system. Its main contribution is introducing the rule ∗ for those entailments with a disjunction in the RHS obtained from predicate unfolding. In contrast to [42], our work includes normalization to soundly and completely avoid disjunction in the RHS during unfolding. Moreover, our decidable fragment SHLIDe is non-overlapping to the cone predicates introduced in [42]. Furthermore, due to the empty heap in the base cases, the matching rule in [42] cannot be applied to the predicates in SHLIDe. Finally, our work also presents how to obtain the global soundness condition for cyclic proofs.

Our work relates to the inductive theorem provers introduced in [10], [40] and Smallfoot [2]. While [10] is based on structural induction, [40] is based on mathematical induction. Smallfoot [2] proposed a decision procedure for linked lists and trees. It used a fixed compositional rule as a consequence of induction reasoning to handle inductive entailments. Compared with Smallfoot, our proof system replaces the compositional rule by combining rule LInd and the back-link construction. Our system could support induction reasoning on a much more expressive fragment of inductive predicates.

Our proposal also relates to works that use lemmas as consequences of induction reasoning [2,16,30,41]. These works in [16,25,30,41] automatically generate lemmas for some classes of inductive predicates. S2 [25] generated lemmas to normalize (such as split and equivalence) the shapes of the synthesized data structures. [16] proposed to generate several sets of lemmas not only for compositional predicates but also for different predicates (e.g., completion lemmas, stronger lemmas and static parameter contraction lemmas). SLS [41] aims to infer general lemmas to prove an entailment. Similarly, S2ENT [30] solves a more generic problem, frame inference, using cyclic proofs and lemma synthesis. It infers a shape-based residual frame in the LHS and then synthesizes the pure constraints over the two sides.

S2SLin relates to model-based decision procedures that reduce the entailment problem in separation logic to a well-studied problem in other domains. For instance, in [8,11,17], the entailment problem, including singly-linked lists and their invariants, is reduced to the problem of inclusion checking in a graph theory. The authors in [18] reduced the entailment problem to the satisfiability problem in second-order monadic logic. This reduction could handle an expressive fragment of spatial-based predicates called bounded-tree width. Moreover, the work presented in [23] shows a model-based decision procedure for a subfragment of the bounded-tree width. Furthermore, while the work in [15,19] reduced the entailment problem to the inclusion checking problem in tree automata, [21] presented an idea to reduce the problem to the inclusion checking problem in heap automata. Moreover, while the procedure in [15] supported compositional predicates (single and double links) well, the procedure in [19] could handle predicates satisfying local properties (e.g., trees with parent pointers). Our decidable fragment subsumes the one described in [2,11,15] but is incomparable to the ones presented in [8,17,18,19]. Works in [34] and [35,36] reduced the entailment problem in separation logic into the satisfiability problem in SMT. While GRASShoper [35,36] could handle transitive closure pure properties, S2SLin is capable of supporting local ones. Unlike GRASShoper, which reduces entailment into SMT problems, S2SLin reduces an entailment to admissible entailments and detects repetitions via cyclic proofs.

Decidable fragments and complexity results of the entailment problem in separation logic with inductive predicates were well studied. The entailment is 2-EXPTIME in cone predicates [42], the bounded tree-width predicates and beyond [18,14], and EXPTIME in a sub-fragment of cone predicates [19]. In the other class, entailment is in polynomial time for singly-linked lists [11] and semantically linear inductive predicates [15]. Moreover, the extensions with arithmetic [17] are in polynomial but become EXPTIME when the lists are extended with double links [8]. SHLIDe (with nested lists, trees and arithmetic properties) is roughly in the "middle" of the two classes above. The entailment is EXPTIME and becomes polynomial under the upper bound restriction.

## 8 Conclusion

We have presented a novel decision procedure for the quantifier-free entailment problem in separation logic combined with inductive definitions of compositional predicates and pure properties. Our proposal is the first complete cyclic proof system for the problem in separation logic without back-tracking. We have implemented the proposal in S2SLin and evaluated it over the set of nontrivial entailments taken from the SL-COMP competition. The experimental results show that our proposal is effective and efficient when compared to the state-of-the-art solvers. For future work, we plan to develop a biabductive procedure based on an extension of this work with the cyclic frame inference procedure presented in [30]. This extension is fundamental to obtaining a compositional shape analysis beyond the lists and trees. Another work is to formally prove that our system is as strong as Smallfoot in the decidable fragment with lists and trees [2]: Given an entailment, if Smallfoot can produce proof, so is S2SLin.

## References


496 Q. L. Le et al.


An Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic 497


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Just Testing**

Rob van Glabbeek1,2(-) -

<sup>1</sup> School of Informatics, University of Edinburgh, Edinburgh, UK <sup>2</sup> School of Computer Science and Engineering, University of New South Wales, Sydney, Australia rvg@cs.stanford.edu

**Abstract.** The concept of must testing is naturally parametrised with a chosen completeness criterion, defining the complete runs of a system. Here I employ justness as this completeness criterion, instead of the traditional choice of progress. The resulting must-testing preorder is incomparable with the default one, and can be characterised as the fair failure preorder of Vogler. It also is the coarsest precongruence preserving linear time properties when assuming justness.

As my system model I here employ Petri nets with read arcs. Through their Petri net semantics, this work applies equally well to process algebras. I provide a Petri net semantics for a standard process algebra extended with signals; the read arcs are necessary to capture those signals.

## **1 Introduction**

May- and must-testing was proposed by De Nicola & Hennessy in [9]. It yields semantic equivalences where two processes are distinguished if and only if they react differently on certain tests. The tests are processes that additionally feature success states. A test <sup>T</sup> is applied to a process N by taking the CCS parallel composition T |N, and implicitly applying a CCS restriction operator to it that removes the remnants of unsuccessful communication. Applying <sup>T</sup> to N is deemed successful if and only if this composition yields a process that may, respectively must, reach a success state. It is trivial to recast this definition using the CSP parallel composition -<sup>A</sup> [39] instead of the one from CCS.

It is not a priori clear how a given process *must* reach a success state. For all we know it might stay in its initial state and never take any transition leading to this success state. To this end one must employ an assumption saying that under appropriate circumstances certain enabled transitions will indeed be taken. Such an assumption is called a *completeness criterion* [18]. The theory of testing from [9] implicitly employs a default completeness criterion that in [25] is called *progress*. However, one can parameterise the notion of must testing by the choice of any completeness criterion, such as the many notions of *fairness* classified in [25]. Here I employ *justness*, a completeness criterion that is better justified than either progress or fairness [25].

<sup>-</sup>Supported by Royal Society Wolfson Fellowship RSWF\R1\221008

<sup>©</sup> The Author(s) 2023

O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. https://doi.org/10.1007/978-3-031-30829-1 24 498–519, 2023.

The resulting must-testing equivalence is incomparable to the progress-based one from [9]. On the one hand, it no longer distinguishes deadlock and livelock, i.e., the Petri nets N and N of Ex. 3; on the other hand, it keeps recording information past a divergence. I characterise the corresponding preorder as the fair failure preorder of Vogler [43], which using my terminology ought to be called the *just failures preorder*. I show that it also is the coarsest precongruence preserving linear time properties when assuming justness. Finally I show that the same preorder originates from the timed must-testing framework explored in [43], but only if all quantitative information is removed from that approach.

I carry out this work within the model of Petri nets extended with read arcs [35,7], so that it also applies to process algebras through their standard Petri net semantics. The extension with read arcs is necessary to capture *signalling*, a process algebra operator that cannot be adequately modelled by standard Petri nets. Signalling, or read arcs, can be used to accurately model mutual exclusion without making a fairness assumption [43,8,11]. This is not possible in standard Petri nets [31,43,24], or in process algebras with a standard Petri net semantics [24]. Here I give a Petri net semantics of signalling, and illustrate its use in modelling a traffic light, interacting with passing cars.

**Acknowledgement** I am grateful to Weiyou Wang for valuable feedback.

## **2 Labelled Petri nets with read arcs**

I will employ the following notations for multisets.

## **Definition 1** Let X be a set.


With {x, x, y} I denote the multiset over {x, y} with A(x)=2 and A(y)=1, rather than the set {x, y} itself. A multiset A with A(x) <sup>≤</sup> 1 for all x is identified with the set {x <sup>|</sup> A(x)=1}.

I employ general labelled place/transition systems extended with read arcs [35,7].

**Definition 2** Let <sup>A</sup> be a set of *visible actions* and <sup>τ</sup> -∈ A be an *invisible action*. Let <sup>A</sup><sup>τ</sup> := <sup>A</sup> . ∪ {τ}.A(*labelled*) *Petri net* (*over* <sup>A</sup><sup>τ</sup> ) is a tuple (S, T, F, R, M0, ) where


Petri nets are depicted by drawing the places as circles and the transitions as boxes, containing their label. Identities of places and transitions are displayed next to the net element. When F(x, y) > 0 for x, y ∈ S ∪ T there is an arrow (*arc*) from x to y, labelled with the *arc weight* F(x, y). Weights 1 are elided. An element (s, t) of the multiset R is called a *read arc*. Read arcs are drawn as lines without arrowhead. When a Petri net represents a concurrent system, a global state of this system is given as a *marking*, a multiset M of places, depicted by placing M(s) dots (*tokens*) in each place s. The initial state is M0.

The behaviour of a Petri net is defined by the possible moves between markings M and M- , which take place when a finite multiset G of transitions *fires*. In that case, each occurrence of a transition t in G consumes F(s, t) tokens from each place s. Naturally, this can happen only if M makes all these tokens available in the first place. Moreover, for each t ∈ G there need to be at least R(s, t) tokens in each place s that are not consumed when firing G. Next, each t produces F(t, s) tokens in each place s. Definition 4 formalises this notion of behaviour.

**Definition 3** Let <sup>N</sup> = (S, T, F, R, M0, ) be a Petri net. The multisets t, •t, t• : <sup>S</sup> <sup>→</sup> are given by t(s) = R(s, t), •t(s) = F(s, t) and t •(s) = F(t, s) for all s∈S. The elements of t, •t and t • are called *read-*, *pre-* and *postplaces* of t, respectively. These functions extend to finite multisets <sup>G</sup>: <sup>T</sup> <sup>→</sup> by <sup>G</sup>- := <sup>t</sup>∈<sup>G</sup> t, • G := <sup>t</sup>∈<sup>T</sup> <sup>G</sup>(t) · •<sup>t</sup> and <sup>G</sup>• := <sup>t</sup>∈<sup>T</sup> <sup>G</sup>(t) · <sup>t</sup> • .

**Definition 4 ([7])** Let <sup>N</sup> = (S, T, F, R, M0, ) be a Petri net, <sup>G</sup>∈<sup>T</sup> non-empty and finite, and M,M- <sup>∈</sup> <sup>S</sup>. <sup>G</sup> is a *step* from <sup>M</sup> to <sup>M</sup>- , written M [G<sup>N</sup> M- , iff <sup>⊆</sup> <sup>M</sup> (<sup>G</sup> is *enabled*) and

$$\begin{array}{l} - \, ^\bullet G + G \subseteq M \, (G \text{ is } ena \\ - \, ^\bullet M' = (M - ^\bullet G) + G ^\bullet. \end{array}$$

Note that steps are (finite) multisets, thus allowing self-concurrency, i.e. the same transition can occur multiple times in a single step. One writes M [t<sup>N</sup> M- for M [{t}<sup>N</sup> M- , whereas M[t<sup>N</sup> abbreviates ∃M- . M [t<sup>N</sup> M- . The subscript N may be omitted if clear from context.

In my Petri nets transitions are labelled with *actions* drawn from a set A . ∪ {τ}. This makes it possible to see these nets as models of *reactive systems* that interact with their environment. A transition t can be thought of as the occurrence of the action (t). If (t)∈A, this occurrence can be observed and influenced by the environment, but if (t) = τ , it cannot and t is an *internal* or *silent* transition. Transitions whose occurrences cannot be distinguished by the

environment carry the same label. In particular, since the environment cannot observe the occurrence of internal transitions at all, they are all labelled τ .

In [31,43,24] it was established that mutual exclusion protocols cannot be correctly modelled in standard Petri nets (without read arcs, i.e., satisfying R(s, t) = 0 for all s <sup>∈</sup> S and t <sup>∈</sup> T), unless their correctness becomes contingent on making a fairness assumption. In [24] it was concluded from this that mutual exclusion protocols can likewise not be correctly expressed in standard process algebras such as CCS [34], CSP [6] or ACP [4], at least when sticking to their standard Petri net semantics. Yet Vogler showed that mutual exclusion can be correctly modelled in Petri nets with read arcs [43], and [8,11] demonstrate how mutual exclusion can be correctly modelled in a process algebra extended with *signalling* [3]. Thus signalling adds expressiveness to process algebra that cannot be adequately modelled in terms of standard Petri nets. This is my main reason to use Petri nets with read arcs as system model in this paper.

In many papers on Petri nets, the sets of places and transitions are required to be finite, or at least countable. Here I need a milder restriction, and will limit attention to nets that are finitary in the following sense.

**Definition 5** A Petri net <sup>N</sup> = (S, T, F, R, M<sup>0</sup>, ) is *finitary* if <sup>M</sup><sup>0</sup> is countable, t • is countable for all t <sup>∈</sup> T, and moreover the set of transitions t with •t <sup>=</sup> <sup>∅</sup> is countable.

## **3 A Petri net semantics of CCSP with signalling**

CCSP [37] is a natural mix of the process algebras CCS [34] and CSP [6], often used in connection with Petri nets. Here I will present a Petri net semantics of a version CCSPS of CCSP enriched with *signalling* [3]. This builds on work from [29,44,27,10,37,38]; the only novelty is the treatment of signalling. Petri net semantics of other process algebras, like CCS [34], CSP [6] or ACP [4], are equally well known. This Petri net semantics lifts any semantic equivalence on Petri nets to CCSPS, or to any other process algebra, so that the results of this work apply equally well to process algebras.

CCSPS is parametrised by the choice of sets A of visible actions and K of *agent identifiers*. Its syntax is given by

$$P, Q, P\_i ::= \sum\_{i \in I} a\_i P\_i \quad \mid a \rhd \sum\_{i \in I} a\_i P\_i \quad \mid P \|\_A Q \quad \mid \tau\_A(P) \quad \mid f(P) \quad \mid K$$

with a, a<sup>i</sup> ∈ A, <sup>A</sup>⊆ A, <sup>f</sup> : A→A and <sup>K</sup> ∈ K. Here the guarded choice <sup>i</sup>∈<sup>I</sup> a<sup>i</sup>P<sup>i</sup> executes one of the actions a<sup>i</sup>, followed by the process P<sup>i</sup>. The process aP behaves as P, except that in its initial state it it is sending the signal a. 1 2 The process P<sup>A</sup>Q is the partially synchronous parallel composition of processes

<sup>1</sup> The notation a-P follows [8]; in [3,11] this is denoted Pˆa.

<sup>2</sup> Here I require P to be a guarded choice in order to avoid the need for a root condition [13] to make the equivalences of this paper into congruences. This is also the reason my language features a guarded choice, instead of action prefixing and general choice.

P and Q, where actions from A can take place only when both P and Q can engage in such an action, while other actions of P and Q occur independently. The abstraction operator τ<sup>A</sup> hides action from A from the environment by renaming them into τ , whereas f is a straightforward relabelling operator (leaving internal actions alone). Each agent identifier K comes with a defining equation <sup>K</sup> def = P, with P a guarded CCSPS expression; it behaves exactly as the body of its defining equation. Here P is guarded if each occurrence of an agent identifier within P lays in the scope of a guarded choice - <sup>i</sup>∈<sup>I</sup> <sup>a</sup>iP<sup>i</sup> or a - <sup>i</sup>∈<sup>I</sup> <sup>a</sup>iPi.

A formal Petri net semantics of CCSPS, and of each of the operators -, , -<sup>A</sup>, τ<sup>A</sup> and f, appears in [22, Appendix A]. Here I give an informal summary.

Given nets N<sup>i</sup> for i∈I, the net - <sup>i</sup>∈<sup>I</sup> <sup>a</sup>iN<sup>i</sup> is obtained by taking their disjoint union, but without their initial markings (M0)i, and adding a single marked place r, and for each i ∈ I a fresh transition ti, labelled ai, with •t<sup>i</sup> = {r}, t <sup>i</sup> <sup>=</sup> <sup>∅</sup> and (• ti)=(M0)i.

The parallel composition N-<sup>A</sup>N is obtained out of the disjoint union of N and N by dropping from N and N all transitions t with (t) ∈ A, and instead adding synchronisation transitions (t, t ) for each pair of transitions t and t from N and N with (t) = (t ) ∈ A. One has • (t, t ) := • t + • t , and similarly for (t, t ) and (t, t ) • , i.e., all arcs are inherited.

τ<sup>A</sup> and f are renaming operators that only affect the labels of transitions.

The net aN adds to the net N a single transition u, labelled a, that may fire arbitrary often, but is enabled in the initial state of N only. To this end, take •<sup>u</sup> <sup>=</sup> <sup>u</sup>• <sup>=</sup> <sup>∅</sup> and <sup>u</sup> <sup>=</sup> <sup>M</sup>0, the initial marking of <sup>N</sup>. I apply this construction only to nets for which its initially marked places have no incoming arcs.

**Example 1** A traffic light can be modelled by the recursive equation

$$TL \stackrel{def}{=} tr.tg.(drive \rhd \, t y. TL).$$

Here the actions tr , tg and ty stand for "turn red", "turn green" and "turn yellow", and drive indicates a state where it is OK to drive through. A sequence of two passing cars is modelled as Traffic def <sup>=</sup> drive.drive.**0**. Here **0** stands for the empty sum - <sup>i</sup>∈∅ <sup>a</sup>i.E<sup>i</sup> and models inaction. In the parallel composition TL -{drive} Traffic the cars only drive through when the light is green. All three processes are displayed in Fig. 1.

## **4 Justness and other completeness criteria**

**Definition 6** Let <sup>N</sup> = (S, T, F, R, M0, ) be a Petri net. An execution path <sup>π</sup> is an alternating sequence M0t1M1t2M<sup>2</sup> ... of markings and transitions of N, starting with M0, and either being infinite or ending with a marking, such that <sup>M</sup><sup>i</sup> [ti+1<sup>N</sup> <sup>M</sup>i+1 for all i < length(π). Here length(π) <sup>∈</sup> ∪ {∞} is the number of transitions in π.

Let (π) ∈ A<sup>∞</sup> <sup>τ</sup> be the string (t1)(t2).... Here A<sup>∞</sup> <sup>τ</sup> denotes the collection of finite and infinite sequences of actions. Moreover, trace(π) ∈ A<sup>∞</sup> is obtained from (π) by dropping all occurrences of τ .

**Fig. 1.** Traffic passing traffic light

The execution path π is said to *enable* a transition t, notation π[t-, if <sup>M</sup>k[t- for some <sup>k</sup> <sup>∈</sup> <sup>∧</sup> <sup>k</sup> <sup>≤</sup> *length*(π) and for all <sup>k</sup> <sup>≤</sup> j < *length*(π) one has <sup>t</sup>j <sup>=</sup> <sup>t</sup> and (•t <sup>+</sup> <sup>t</sup>) <sup>∩</sup> •tj+1 <sup>=</sup> <sup>∅</sup>.

Path π is B*-just*, for some B ⊆ A, if (t) <sup>∈</sup> B for all t <sup>∈</sup> T with π[t-.

In the definition of π[t above one also has <sup>M</sup>j+1[t for all k <sup>≤</sup> j < *length*(π). Hence, a finite execution path enables a transition iff its final marking does so.

Informally, π[t holds iff transition t is enabled in some marking on the path π, and after that state no transition of π uses any of the resources needed to fire t. Here the read- and preplaces of t count as such resources. The clause <sup>t</sup>j <sup>=</sup> <sup>t</sup> moreover counts the transition itself as one of its resources, in the sense that a transition is no longer enabled when it occurs. This clause is redundant for transitions t with •<sup>t</sup> <sup>=</sup> <sup>∅</sup>. One could interpret this clause as saying that a transition <sup>t</sup> with •<sup>t</sup> <sup>=</sup> <sup>∅</sup> comes with implicit marked private preplace <sup>p</sup>t, and arcs (pt, t) as well as (t, pt).

In [18] I posed that Petri nets or transition systems constitute a good model of concurrency only in combination with a *completeness criterion*: a selection of a subset of all execution paths as complete executions, modelling complete runs of the represented system. The default completeness criterion, called *progress* in [25], declares an execution path complete iff it either is infinite, or its final marking enables no transition. An alternative, called *justness* in [25], declares an execution path complete iff it enables no transition. Justness is a *stronger* completeness criterion than progress, in the sense that it deems fewer execution paths complete. The difference is illustrated by the Petri net of Fig. 2(a). There, the execution of an infinite sequence of b-transitions, not involving the a-transition,

**Fig. 2.** (a) Progress vs. justness; (b) Justness vs. fairness; (c) {b}-progress vs. <sup>∅</sup>-progress

is complete when assuming progress, but not when assuming justness. In the survey paper [25], 20 different completeness criteria are ordered by strength: progress, justness, and 18 kinds of fairness. Most of the latter are stronger than justness: in Fig. 2(b) the infinite sequence of b-transitions is just but unfair—i.e. incomplete according to these notions of fairness. Whereas justness was a new idea in the context of transition systems [25], it was used as an unnamed default assumption in much work on Petri nets [40]. That justness is better warranted in applications than other completeness criteria has been argued in [25,18,24,17].

The mentioned completeness criteria from [25] are all stronger than progress, in the sense that not all infinite execution paths are deemed complete; on the finite execution paths they judge the same. An orthogonal classification is obtained by varying the set B ⊆ A of actions that may be blocked by the environment. This fits the reactive viewpoint, in which a visible action can be regarded as a synchronisation between the modelled system and its environment. An environment that is not ready to synchronise with an action b ∈ A can be regarded as blocking b. Now B-progress is the criterion that deems a path complete iff it is either infinite, or its final marking M enables only transitions with labels from B. When the environment may block such transitions, it is possible for the system to not progress past M. In Fig. 2(c) the execution that performs only the τ -transition is complete when assuming {b}-progress, but not when assuming ∅-progress. Definition 6 defines B-justness accordingly, and [25] furthermore defines 18 different notions of B-fairness, for any choice of B ⊆ A. The internal action τ /∈ B can never be blocked by the environment. The default forms of progress and justness described above correspond with ∅-progress and ∅-justness. In [40] blocking and non-blocking transitions are called cold and hot, respectively.

Two subtly different computational interpretations of Petri nets appear in the literature [14]: in the individual token interpretation multiple tokens appearing in the same place are seen as different resources, whereas in the collective token interpretation only the number of tokens in a place is semantically relevant. The difference is illustrated in Fig. 3.

**Fig. 3.** Run <sup>a</sup><sup>∞</sup> is just under the individual token interpretation of Petri nets

The idea underlying justness is that once a transition t is enabled, eventually either t will fire, or one of the resources necessary for firing t will be used by some other transition. The execution path π in the net of Fig. 3 that fires the action a infinitely often, but never the action b, is ∅-just by Def. 6. Namely, t <sup>b</sup> is not enabled by π, as (•t <sup>b</sup> <sup>+</sup> t <sup>b</sup>) <sup>∩</sup> •<sup>t</sup> <sup>a</sup> <sup>=</sup> <sup>∅</sup>. This fits with the individual token interpretation, as in this run it is possible to eventually consume each token that is initially present, and each token that stems from firing transition t <sup>a</sup>. This way any resource available for firing t <sup>b</sup> will eventually be used by some other transition.

When adhering to the collective token interpretation of nets, execution path <sup>π</sup> could be deemed <sup>∅</sup>-unjust, since transition <sup>t</sup> <sup>b</sup> can fire when there is at least one token in its preplace, and this state of affairs can be seen as a single resource that is never taken away. This might be formalised by adapting the definition of <sup>π</sup>[t, a path enabling a transition, namely by changing the condition (•t+<sup>t</sup>)∩•t<sup>j</sup>+1 <sup>=</sup> <sup>∅</sup> from Def. <sup>6</sup> into •t+<sup>t</sup>+•t<sup>j</sup>+1 <sup>⊆</sup> <sup>M</sup><sup>j</sup> . However, this formalisation doesn't capture that after dropping place s from the net of Fig. 3 there is still an infinite run in which b does not occur, namely when regularly firing two as simultaneously. This contradicts the conventional wisdom that firing multiple transitions at once can always be reduced to firing them in some order. To avoid that type of complication, I here stick to the individual token interpretation. Alternatively, one could restrict attention to 1-safe nets [40], on which there is no difference between the individual and collective token interpretations, or to the larger class of *structural conflict nets* [23,21], on which the conditions (•<sup>t</sup> <sup>+</sup> <sup>t</sup>) <sup>∩</sup> •t<sup>j</sup>+1 <sup>=</sup> <sup>∅</sup> and •<sup>t</sup> <sup>+</sup> <sup>t</sup> <sup>+</sup> •t<sup>j</sup>+1 <sup>⊆</sup> <sup>M</sup><sup>j</sup> are equivalent [21, Section 23.1], so that Def. <sup>6</sup> applies equally well to the collective token interpretation.

## **5 Feasibility**

A standard requirement on fairness assumptions, or completeness criteria in general, is *feasibility* [2], called *machine closure* in [33]. It says that any finite execution path can be extended into a complete one. The following theorem shows that B-justness is feasible indeed.

**Theorem 1** For any <sup>B</sup> ⊆ A, each finite execution path of a finitary Petri net can be extended into a B-just path.

*Proof.* Without loss of generality I restrict attention to nets without transitions <sup>t</sup> with •<sup>t</sup> <sup>=</sup> <sup>∅</sup>. Namely, an arbitrary net can be enriched with marked private preplaces p<sup>t</sup> for each such t, and arcs (pt, t) and (t, pt). In essence, this enrichment preserves the collection of execution path of the net, ordered by the relation "is an extension of", the validity of statements <sup>π</sup>[t, and the property of <sup>B</sup>-justness.

I present an algorithm extending any given path <sup>M</sup>0t1M1t<sup>2</sup> ...t<sup>k</sup>−<sup>1</sup>M<sup>k</sup> into a B-just path π = M0t1M1t2M<sup>2</sup> .... The extension only uses transitions t<sup>i</sup> with (ti) <sup>∈</sup>/ <sup>B</sup>. As data structure my algorithm employs an <sup>×</sup> -matrix with columns named <sup>i</sup>, for <sup>i</sup> <sup>≥</sup> <sup>k</sup>, where each column has a head and a body. The head of column <sup>k</sup> contains <sup>M</sup><sup>k</sup> and its body lists the places <sup>s</sup> <sup>∈</sup> <sup>M</sup>k, leaving empty most slots if there are only finitely many such places. Since the given net is finitary, M<sup>k</sup> has only countable many elements, so that they can be listed in the slots of column k.

The head of each column i>k with <sup>i</sup>−<sup>1</sup> <sup>&</sup>lt; *length*(π) will contain the pair (ti, Mi) and its body will list the places <sup>s</sup> <sup>∈</sup> <sup>M</sup>i, again leaving empty most slots if there are only finitely many such places. Once more, finitariness ensures that there are enough slots in column i.

An entry in the body of the matrix is either (still) empty, filled in with a place, or crossed out. Let <sup>f</sup> : <sup>→</sup> <sup>×</sup> be an enumeration of the entries in the body of this matrix.

At the beginning only column k is filled in; all subsequent columns of the matrix are empty. At each step i>k I first cross out all entries s in the body of the matrix for which there is no transition t with -(t) ∈/ B, M<sup>i</sup>−<sup>1</sup>[t and s ∈ •t. In case all entries of the matrix are crossed out, the algorithm terminates, with output M0t1M1t<sup>2</sup> ...M<sup>i</sup>−<sup>1</sup>. Otherwise I fill in column i as follows and cross out some more places occurring in body of the matrix.

I take <sup>n</sup> to be the smallest value such that entry <sup>f</sup>(n) <sup>∈</sup> <sup>×</sup> is already filled in, say with place r, but not yet crossed out. By the previous step of the algorithm, M<sup>i</sup>−<sup>1</sup>[ti for some transition t<sup>i</sup> with -(ti) ∈/ B and r ∈ •ti. I now fill in (ti, Mi) in the head of column i; here M<sup>i</sup> is the unique marking such that M<sup>i</sup>−<sup>1</sup>[tiMi. Subsequently I cross out all entries in the body of the matrix containing a place r- ∈ •ti. This includes the entry f(n). Finally, I fill in the body of column i with the places s ∈ Mi.

In case the algorithm doesn't terminate, the desired path π is the sequence π = M0t1M1t2M<sup>2</sup> ... that is constructed in the limit. It remains to show that π is B-just.

Towards a contradiction, suppose π[t for a transition t with -(t) ∈/ B. By Def. <sup>6</sup> there is an <sup>m</sup> <sup>∈</sup> ∧<sup>m</sup> <sup>≤</sup> *length*(π) such that <sup>M</sup>m[t and (•t+t)∩•tj+1 = ∅ for all <sup>m</sup> <sup>≤</sup> j < *length*(π). Let <sup>h</sup> be the smallest such <sup>m</sup> with <sup>m</sup> <sup>≥</sup> <sup>k</sup>. Then there is a place r ∈ •t appearing in column h. Here I use that •t = ∅. This place was not yet crossed out when column h was constructed. Since r /∈ •tj+1 and Mj+1[t for all <sup>h</sup> <sup>≤</sup> j < *length*(π), place <sup>r</sup> will never be crossed out. It follows that <sup>π</sup> must be infinite. The entry <sup>r</sup> in column <sup>h</sup> is enumerated as <sup>f</sup>(n) for some <sup>n</sup> <sup>∈</sup> , and is eventually reached by the algorithm and crossed out. In this regard the matrix acts as a priority queue. This yields the required contradiction. 

The above proof is a variant of [18, Thm. 1], which itself is a variant of [25, Thm. 6.1]. The side condition of finitariness is essential, as the below counterexample shows.

**Example 2** Let <sup>N</sup> = (S, T, F, R, M0, -) be the net with <sup>T</sup> <sup>=</sup> {t<sup>r</sup> <sup>|</sup> <sup>r</sup> <sup>∈</sup> }, <sup>S</sup> <sup>=</sup> {s<sup>r</sup> <sup>|</sup> <sup>r</sup> <sup>∈</sup> }, <sup>M</sup>0(sr) = 1, -(tr) = <sup>τ</sup> , •t<sup>r</sup> <sup>=</sup> {sr} and t<sup>r</sup> = t • <sup>r</sup> = ∅ for each <sup>r</sup> <sup>∈</sup> . It contains uncountably many action transitions, each with a marked private preplace. As each execution path π contains only countably many transitions, many transitions remain enabled by π.

## **6 The coarsest preorders preserving linear time properties**

<sup>A</sup> *linear time property* is a predicate on system runs, and thus also on the execution paths of Petri nets. One writes π |= ϕ if the execution path π satisfies the linear-time property ϕ. As the observable behaviour of an execution path π of a Petri net is deemed to be *trace*(π), in this context one studies only linear time properties ϕ such that

$$trace(\pi) = trace(\pi') \quad \Leftrightarrow \quad (\pi \vdash \varphi \Leftrightarrow \pi' \vdash \varphi) \,. \tag{1}$$

For this reason, a linear time property can be defined or characterised as a subset of A<sup>∞</sup>.

Linear time properties can be used to formalise correctness requirements on systems. They are deemed to hold for (or be satisfied by) a system iff they hold for all its complete runs. Following [20] I write <sup>D</sup> <sup>|</sup>=CC <sup>ϕ</sup> iff property <sup>ϕ</sup> holds for all runs of the distributed system <sup>D</sup>—and <sup>N</sup> <sup>|</sup>=CC <sup>ϕ</sup> iff it holds for all execution paths of the Petri net N—that are complete according to the completeness criterion CC. Prior to [20], |= was a binary predicate predicate between systems—or system representations such as Petri nets—and properties; in this setting the default completeness criterion of Section 4 was used. When using a completeness criterion B-C, where C is one of the 20 completeness criteria classified in [25] and B ⊆ A is a modifier of C based on the set B of actions that may be blocked by the environment, <sup>N</sup> <sup>|</sup>=B-C <sup>ϕ</sup> is written <sup>N</sup> <sup>|</sup>=<sup>C</sup> <sup>B</sup> ϕ [20]. In this paper I am mostly interested in the values Pr and J of C, standing for progress and justness, respectively. To be consistent with previous work on temporal logic, <sup>N</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> is a shorthand for <sup>N</sup> <sup>|</sup>=Pr <sup>∅</sup> <sup>ϕ</sup>.

For each completeness criterion B-C, let <sup>C</sup> <sup>B</sup> be the coarsest preorder that preserves linear time properties when assuming B-C. Moreover, <sup>C</sup> is the coarsest preorder that preserves linear time properties when assuming completeness criterion C in each environment, meaning regardless which set of actions B can be blocked.

**Definition 7** Write <sup>N</sup> <sup>C</sup> <sup>B</sup> N iff <sup>N</sup> <sup>|</sup>=<sup>C</sup> <sup>B</sup> ϕ ⇒ N- <sup>|</sup>=<sup>C</sup> <sup>B</sup> ϕ for all linear time properties <sup>ϕ</sup>. Write <sup>N</sup> <sup>C</sup> <sup>N</sup> iff <sup>N</sup> <sup>C</sup> <sup>B</sup> Nfor all B ⊆ A.

It is trivial to give a more explicit characterisation of these preorders. To preserve the analogy with the failure pairs of CSP [6], instead of sets B ⊆ A I will record their complements B := A\B. As B = B, such sets carry the same information. Since B contains the actions that may be blocked by the environment, meaning that we consider environments that in any state may decide which actions from B to block, the set B ∪ {τ} contains actions that may not be blocked by the environment. This means that we only consider environments that in any state are willing to synchronise with any action in B.

**Definition 8** For completeness criterion C, B ranging over P(A), and Petri net N, let

$$\begin{array}{l} \mathcal{F}^C(N) \coloneqq \{ (\sigma, \overline{B}) \mid N \text{ has a } B \text{-} C \text{-complete execution path } \pi \text{ with } \sigma = trace(\pi) \} \\\mathcal{F}^C\_B(N) \coloneqq \{ \begin{array}{l} \sigma \ \mid N \text{ has a } B \text{-} C \text{-complete execution path } \pi \text{ with } \sigma = trace(\pi) \} . \end{array} \end{array}$$

An element (σ, X) of F<sup>C</sup> (N) could be called a C-failure pair of N, because it indicates that the system represented by N, when executing a path with visible content σ, may fail to execute additional actions from X, even when all these actions are offered by the environment, in the sense that the environment is perpetually willing to partake in those actions. Note that if (σ, X) <sup>∈</sup> <sup>F</sup><sup>C</sup> (N) and <sup>Y</sup> <sup>⊆</sup> <sup>X</sup> then (σ, Y ) <sup>∈</sup> <sup>F</sup><sup>C</sup> (N).

**Proposition 1** <sup>N</sup> <sup>C</sup> <sup>B</sup> N iff F<sup>C</sup> <sup>B</sup> (N) <sup>⊇</sup> <sup>F</sup><sup>C</sup> <sup>B</sup> (N- ). Likewise, <sup>N</sup> <sup>C</sup> <sup>N</sup> iff <sup>F</sup><sup>C</sup> (N) <sup>⊇</sup> <sup>F</sup><sup>C</sup> (N- ).

Proof. Suppose <sup>N</sup> <sup>C</sup> <sup>B</sup> N and σ /<sup>∈</sup> <sup>F</sup><sup>C</sup> <sup>B</sup> (N). Let ϕ be the linear time property satisfying <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> iff trace(π) <sup>=</sup> <sup>σ</sup>. Then <sup>N</sup> <sup>|</sup>=<sup>C</sup> <sup>B</sup> ϕ and thus N- <sup>|</sup>=<sup>C</sup> <sup>B</sup> ϕ. Hence σ /<sup>∈</sup> <sup>F</sup><sup>C</sup> <sup>B</sup> (N- ).

Suppose <sup>N</sup> <sup>C</sup> <sup>B</sup> N- . There there exists a linear time property ϕ such that <sup>N</sup> <sup>|</sup>=<sup>C</sup> <sup>B</sup> ϕ, yet N- |=<sup>C</sup> <sup>B</sup> ϕ. Let π be a B-C-complete execution path of N such that π- |= ϕ, and let σ = trace(π- ). By (1) π |= ϕ for any execution path π (of any net) such that trace(π) = <sup>σ</sup>. Hence <sup>σ</sup> <sup>∈</sup> <sup>F</sup><sup>C</sup> <sup>B</sup> (N- ), yet σ /<sup>∈</sup> <sup>F</sup><sup>C</sup> <sup>B</sup> (N). It follows that F<sup>C</sup> <sup>B</sup> (N) ⊇ <sup>F</sup><sup>C</sup> <sup>B</sup> (N- ).

The second statement follows as a corollary of the first, using that <sup>F</sup><sup>C</sup> (N) <sup>⊇</sup> F<sup>C</sup> (N- ) iff F<sup>C</sup> <sup>B</sup> (N) <sup>⊇</sup> <sup>F</sup><sup>C</sup> <sup>B</sup> (N- ) for all B ⊆ A.

The preorders <sup>C</sup> <sup>B</sup> can be classified as linear time semantics [12], as they are characterised through reverse trace inclusions. The preorders <sup>C</sup> on the other hand capture a minimal degree of branching time. This is because they should be ready for different choices of a system's environment at runtime.

Note that <sup>C</sup> is contained in <sup>C</sup> <sup>B</sup> for each <sup>B</sup> ⊆ A, in the sense that <sup>N</sup> <sup>C</sup> <sup>N</sup>- implies <sup>N</sup> <sup>C</sup> <sup>B</sup> N- . There is a priori no reason to assume inclusions between preorders <sup>C</sup> and <sup>D</sup> when <sup>D</sup> is a stronger completeness criterion than <sup>C</sup>.

To relate the preorders <sup>C</sup> <sup>B</sup> and <sup>C</sup> with ones established in the literature, I consider the case C = Pr , i.e., taking progress as the completeness criterion C. The preorder Pr <sup>∅</sup> is characterised as reverse inclusion of complete traces, where completeness is w.r.t. the default completeness criterion of Section 4. These complete traces include


Deadlock and divergence traces are not distinguished. This corresponds with what is called divergence sensitive trace semantics (T <sup>λ</sup>) in [12]. The above concept of complete traces of a process p is the same as in [15], there denoted CT(p).

The preorder Pr <sup>A</sup> is characterised as reverse inclusion of infinite and partial traces, i.e., the traces of all execution paths. This corresponds with what is called infinitary trace semantics (T <sup>∞</sup>) in [12]. It is strictly coarser (making more identifications) than T <sup>λ</sup>.

To analyse the preorder Pr , one has (σ, X) <sup>∈</sup> <sup>F</sup>Pr (N) if either


**–** σ is the trace of a finite path of N whose end-marking enables no transition t with (t) ∈ X.

The resulting preorder does not occur in [12]—it can be placed strictly between *divergence sensitive failure semantics* (F <sup>Δ</sup>) and *divergence sensitive trace semantics* (T <sup>λ</sup>).

The entire family of preorders <sup>C</sup> <sup>B</sup> and <sup>C</sup> proposed in this section was inspired by its most interesting family member, <sup>J</sup> (i.e., taking justness as the completeness criterion C), proposed earlier by Walter Vogler [43, Def. 5.6], also on Petri nets with read arcs. Vogler [43] uses the word *fair* for what I call *just*. I believe the choice of the word "just" is warranted to distinguish the concept from the many other kinds of fairness that appear in the literature, which are all of a very different nature. Accordingly, Vogler calls the semantics induced by <sup>J</sup> the *fair failure* semantics, whereas I call it the *just failures* semantics. My set <sup>F</sup><sup>J</sup> (N) is called FF(N) in [43], and Vogler addresses <sup>J</sup> simply as FF-inclusion, thereby defining it via the right-hand side of Prop. 1.

## **7 Congruence properties**

A preorder is called a *precongruence* for an n-ary operator *Op*, if N<sup>i</sup> N- i for i = 1,...,n implies that *Op*(N1,...,Nn) *Op*(N- 1,...,N- <sup>n</sup>). In this case the operator *Op* is said to be *monotone* w.r.t. the preorder . Being a precongruence for important operators is known to be a valuable tool in compositional verification [41].

I write ≡ for the kernel of , that is, N ≡ N iff N N- ∧ N- N. Here I also imply that <sup>≡</sup><sup>C</sup> <sup>B</sup> is the kernel of <sup>C</sup> <sup>B</sup>. If is a precongruence for *Op*, then ≡ is a *congruence* for *Op*, meaning that N<sup>i</sup> ≡ N- <sup>i</sup> for i = 1,...,n implies that *Op*(N1,...,Nn) ≡ *Op*(N- 1,...,N- n).

The preorder Pr <sup>A</sup> , characterised as reverse inclusion of infinite and partial traces, is well-known to be precongruence for the operators of CCSP. However, none of the other preorders Pr <sup>B</sup> , nor Pr , is a precongruence for parallel composition.

**Example 3** Let N = • , N- = • τ and T = • w . The definition of <sup>∅</sup> yields T ∅<sup>N</sup> <sup>=</sup> • • <sup>w</sup> and T ∅N- = • τ • w . One has <sup>N</sup> <sup>≡</sup>Pr <sup>N</sup>- , and thus also <sup>N</sup> <sup>≡</sup>Pr <sup>B</sup> N- , for each <sup>B</sup> ⊆ A. Namely <sup>F</sup>Pr (N) = FPr (N- ) = {(ε, X) | X ⊆ A}. Here ε denotes the empty string. When fixing B such that <sup>B</sup> <sup>=</sup> <sup>A</sup> one may choose w /<sup>∈</sup> <sup>B</sup>. Now <sup>ε</sup> <sup>∈</sup> <sup>F</sup>Pr <sup>B</sup> (T ∅N- ), for this process has an infinite execution path that avoids the w-transition, which generates a divergence trace <sup>ε</sup>. Yet ε /<sup>∈</sup> <sup>F</sup>Pr <sup>B</sup> (T ∅N). Hence T ∅<sup>N</sup> Pr <sup>B</sup> T ∅N- , and thus also T ∅<sup>N</sup> Pr T ∅N- . So neither Pr <sup>B</sup> nor Pr are precongruences for ∅.

A common solution to the problem of a preorder not being a precongruence for certain operators is to instead consider its *congruence closure*, defined as the largest precongruence contained in .

In [30,15] the congruence closure of -Pr is characterised as the so-called NDFD preorder -NDFD . Here N -NDFD N iff N -Pr N- (characterised in the previous section) and moreover the divergence traces of N are included in those of N. As remarked in [15], here it does not matter whether one requires congruence closure merely w.r.t. parallel composition and injective relabelling, or w.r.t. all operators of CSP (or CCSP, or anything in between).

Unlike -Pr , the preorder -<sup>J</sup> is a precongruence for parallel composition. Although this has been proven already by Vogler [43], [22, in Appendix B] I provide a proof that bypasses the auxiliary notion of urgent transitions, and provides more details.

**Proposition 2 ([43])** -<sup>J</sup> is a precongruence for relabelling and abstraction.

Proof. This follows since <sup>F</sup><sup>J</sup> (f(N)) = {(f(σ), X) <sup>|</sup> (σ, f <sup>−</sup><sup>1</sup>(X)) <sup>∈</sup> <sup>F</sup><sup>J</sup> (N)} and moreover <sup>F</sup><sup>J</sup> (τ<sup>I</sup> (N)) = {(τ<sup>I</sup> (σ), X) <sup>|</sup> (σ, X <sup>∪</sup> <sup>I</sup>) <sup>∈</sup> <sup>F</sup><sup>J</sup> (N)}. Here <sup>τ</sup><sup>I</sup> (σ) is the result of pruning all I-actions from σ ∈ A<sup>∞</sup>.

Trivially, -<sup>J</sup> also is a precongruence for aiP<sup>i</sup> and a aiPi.

The preorder -J <sup>A</sup> can be seen to coincide with -Pr <sup>A</sup> , characterised as reverse inclusion of infinite and partial traces, and thus is a precongruence for the operators of CCSP. Leaving open the case |A\B| = 1, the preorders -J <sup>B</sup> with |A\B| ≥ 2 fail to be precongruences for parallel composition.

**Example 4** Take b, c /∈ B. Let N, Nand T be as shown in Fig. 4. Then

**Fig. 4.** The preorders <sup>J</sup> <sup>B</sup> with |A\B| ≥ 2 fail to be precongruences for parallel comp.

<sup>N</sup> <sup>≡</sup><sup>J</sup> <sup>B</sup> N- , as F<sup>J</sup> <sup>B</sup>(N) = F<sup>J</sup> B(N- ) = {ε, ab, ac}. (Whether ε is included depends on whether a∈B.) Yet T AN ≡J <sup>B</sup> T AN- , as <sup>a</sup>∈F<sup>J</sup> <sup>B</sup>(T AN- ), yet a /∈F<sup>J</sup> <sup>B</sup>(T AN).

Moreover, as illustrated below, the preorders -J <sup>B</sup> with B = ∅ and |A\B| ≥ 1 fail to be precongruences for abstraction. In the next section I will show that, for A infinite and B = A, the congruence closure of -J <sup>B</sup> for parallel composition, abstraction and relabelling is -J .

**Example 5** Take b ∈ B and c /∈ B. Let N and N be as shown in Fig. 5. Then <sup>N</sup> <sup>≡</sup><sup>J</sup> <sup>B</sup> N- , as F<sup>J</sup> <sup>B</sup>(N) = F<sup>J</sup> B(N- ) = {ε, bc}. Yet τ{b}(N) ≡J <sup>B</sup> τ{b}(N- ), since <sup>ε</sup> <sup>∈</sup> <sup>F</sup><sup>J</sup> <sup>B</sup>(τ{b}(N- )), yet ε /<sup>∈</sup> <sup>F</sup><sup>J</sup> <sup>B</sup>(τ{b}(N)).

**Fig. 5.** The preorders -J <sup>B</sup> with ∅ = B = A fail to be precongruences for abstraction

## **8 Must Testing**

<sup>A</sup> *test* is a Petri net, but featuring a special action w /∈ A<sup>τ</sup> , not used elsewhere. This action is used to mark *success markings*: those in which w is enabled. If <sup>T</sup> is a test and <sup>N</sup> a net then <sup>τ</sup>A(T AN) is also a test. An execution path of <sup>τ</sup>A(T AN) is *successful* iff it contains a success marking.

**Definition 9** A Petri net <sup>N</sup> *may pass* a test <sup>T</sup>, notation <sup>N</sup> **may** <sup>T</sup>, if <sup>τ</sup>A(T AN) has a successful execution path. It *must pass* <sup>T</sup> , notation <sup>N</sup> **must** <sup>T</sup> , if each complete execution path of <sup>τ</sup>A(T AN) is successful. It *should pass* <sup>T</sup> , notation <sup>N</sup> **should** <sup>T</sup> , if each finite execution path of <sup>τ</sup>A(T AN) can be extended into a successful execution path.

Write <sup>N</sup> must <sup>N</sup> if <sup>N</sup> **must** <sup>T</sup> implies <sup>N</sup> **must** T for each test T . The preorders may and should are defined similarly.

The may- and must-testing preorders stem from De Nicola & Hennessy [9], whereas should-testing was added independently in [5] and [36].

In the original work on testing [9] the CCS parallel composition T |<sup>N</sup> was used instead of the concealed CCSP parallel composition <sup>τ</sup>A(T AN); moreover, only those execution paths consisting solely of internal actions mattered for the definitions of passing a test. The present approach is equivalent. First of all, restricting attention to execution paths of T |<sup>N</sup> consisting solely of internal actions is equivalent to putting T |<sup>N</sup> is the scope of a CCS restriction operator \A [34], for that operator drops all transitions of its argument that are not labelled <sup>τ</sup> or <sup>w</sup>. Secondly, CCS features a complementary action ¯<sup>a</sup> for each <sup>a</sup> ∈ A, and one has <sup>a</sup>¯¯ <sup>=</sup> <sup>a</sup>. For <sup>T</sup> a test, let <sup>T</sup> denote the complementary test in which each action <sup>a</sup> ∈ A is replaced by ¯a; again <sup>T</sup> <sup>=</sup> <sup>T</sup> . It follows directly from the definitions of the operators involved that <sup>τ</sup>A(T AN) is identical<sup>3</sup> to (T |N)\A. This proves the equivalence of the two approaches.

<sup>3</sup> The standard definition of <sup>|</sup> on Petri nets [28] is given only up to isomorphism. By choosing the names of places and transitions similar to those in the defintion of <sup>A</sup> from [22, Appendix A] one can obtain τA(T AN)=(T |N)\A.

Unlike may- and should-testing, the concept of must-testing is naturally parametrised with a completeness criterion, deciding what counts as a complete execution. To make this choice explicit I use the notation -C must, where C could be any of the completeness criteria surveyed in [25]. Since processes <sup>τ</sup>A(T AN) (or (T |N)\A) do not feature any actions other than <sup>τ</sup> and <sup>w</sup>, where <sup>w</sup> is used merely to point to the success states, the modifier <sup>B</sup> ⊆ A of a completeness criteria B-C has no effect, i.e., any two choices of this modifier are equivalent.

In the original work of [9] the default completeness criterion progress from Section 4 was employed. Interestingly, -Pr must is a congruence for the operators of CCSP that does not preserve all linear time properties. It is strictly coarser than -NDFD . In fact, it is the coarsest precongruence for the CCSP parallel composition and injective relabelling that preserves those linear time properties that express that a system will eventually reach a state in which something [good] has happened [15]. (In [15], following [32], but deviating from the standard terminology of [1], such properties are called liveness properties.)

In this paper I investigate the must-testing preorder when taking justness as the underlying completeness criterion, -J must. Thm. 2 below shows that it can be characterised as the just failures preorder -<sup>J</sup> of Section 6.

First note that Def. 9 can be simplified. When dealing with justness as completeness criterion, the word "complete" in Def. 9 is instantiated by "just" or "B-just", for some <sup>B</sup> ⊆ A (not including <sup>w</sup>). As the result is independent of <sup>B</sup>, one may take <sup>B</sup> := <sup>∅</sup>. Since the labelling of a net has no bearing on its execution paths, or on whether such a path is ∅-just, or successful, one may now drop the operator τ<sup>A</sup> from Def. 9 without affecting the resulting notion of must testing.

#### **Theorem 2** <sup>N</sup> -J must N iff <sup>N</sup> -<sup>J</sup> N- .

Proof. The "if" direction is established in [22, Appendix C].

For "only if", suppose <sup>N</sup> -J must N- . Using Prop. 1, it suffices to show that <sup>F</sup><sup>J</sup> (N) <sup>⊇</sup> <sup>F</sup><sup>J</sup> (N- ). Let (σ, X) <sup>∈</sup> <sup>F</sup><sup>J</sup> (N- ), where <sup>σ</sup> <sup>=</sup> <sup>a</sup>1a<sup>2</sup> ... ∈ A<sup>∞</sup> is a finite or infinite sequence of actions. Let T be the test displayed in Fig. 6. The drawing is for the case that σ = a1a<sup>2</sup> ...a<sup>n</sup> finite; in the infinite case, there is no need to display <sup>a</sup><sup>n</sup> separately. Now <sup>K</sup> **must** <sup>T</sup> , for any net <sup>K</sup>, when using justness as completeness criterion, iff each <sup>∅</sup>-just execution path of T A<sup>K</sup> is successful, which is the case iff (σ, X) <sup>∈</sup>/ <sup>F</sup><sup>J</sup> (K). (In other words, T A<sup>K</sup> has an unsuccessful <sup>∅</sup>-just execution path iff (σ, X) <sup>∈</sup> <sup>F</sup><sup>J</sup> (K). For the meaning of (σ, X) <sup>∈</sup> <sup>F</sup><sup>J</sup> (K) is that <sup>K</sup> has an execution path <sup>π</sup> with trace(π) = <sup>σ</sup> such that K(t) <sup>∈</sup> <sup>X</sup> ⇒ ¬π[<sup>t</sup> .) Hence N **must not** <sup>T</sup> and thus <sup>N</sup> **must not** <sup>T</sup> , and thus (σ, X) <sup>∈</sup> <sup>F</sup><sup>J</sup> (N). 

**Proposition 3** Let <sup>A</sup> be infinite and <sup>B</sup> <sup>=</sup> <sup>A</sup>. Then -<sup>J</sup> is the congruence closure of -J <sup>B</sup> for parallel composition, abstraction and injective relabelling.

Proof. Pick an action <sup>w</sup> ∈ A\B. Assume <sup>N</sup> -<sup>J</sup> N- . By applying an injective relabelling, one can assure that w does not occur in N or N- . Let (σ, X) <sup>∈</sup> <sup>F</sup><sup>J</sup> (N- ), yet (σ, X) <sup>∈</sup>/ <sup>F</sup><sup>J</sup> (N), with w /<sup>∈</sup> <sup>X</sup>. Let <sup>T</sup> be the net of Fig. 6. Then, writing <sup>A</sup> := A\{w}, (σ, <sup>A</sup>) <sup>∈</sup> <sup>F</sup><sup>J</sup> (T AN- ), yet (σ, <sup>A</sup>) <sup>∈</sup>/ <sup>F</sup><sup>J</sup> (T AN). Moreover, (ρ, <sup>A</sup>) <sup>∈</sup>/ <sup>F</sup><sup>J</sup> (T AN- ) and (ρ, <sup>A</sup>) <sup>∈</sup>/ <sup>F</sup><sup>J</sup> (T AN) for any <sup>ρ</sup> <sup>=</sup> <sup>σ</sup> not containing the action

**Fig. 6.** Universal test for just must testing

w. Hence, applying the proof of Prop. 2, using that A ∪ B = A, one has (ε, B) ∈ <sup>F</sup><sup>J</sup> (τA(T AN- )), yet (ε, <sup>B</sup>) <sup>∈</sup>/ <sup>F</sup><sup>J</sup> (τA(T AN)). Thus <sup>ε</sup> <sup>∈</sup> <sup>F</sup><sup>J</sup> B(τA(T AN- )), yet ε /<sup>∈</sup> <sup>F</sup><sup>J</sup> <sup>B</sup>(τA(T AN)). It follows that <sup>τ</sup>A(T AN) <sup>J</sup> <sup>B</sup> τA(T AN- ).

## **9 Timed must-testing**

A timed form of must-testing was proposed by Vogler in [43]. Justness says that each transition that gets enabled must fire eventually, unless one of its necessary resources will be taken away. In Vogler's framework, each transition t must fire within 1 unit of time after it becomes enabled, even though it can fire faster. The implicit timer is reset each time t becomes disabled and enabled again, by another transition taken a token and returning it to one of the replaces of t. Since there is no lower bound on the time that may elapse before a transition fires, this view encompasses the same asynchronous behaviour of nets as under the assumption of justness.

Vogler's work only pertains to *safe* nets: those with the property that no reachable marking allocates multiple tokens to the same place. Here a marking is *reachable* if it occurs in some execution path. Transitions t with •t = ∅ are excluded. Although he only considered finite nets, here I apply his work unchanged to *finitely branching* nets: those in which only finitely many transitions are enabled in each reachable marking.

**Definition 10 ([43])** A *continuous(ly timed ) instantaneous description (CID)* of a net N is a pair (M, ξ) consisting of a marking M of N and a function ξ mapping the transitions enabled under M to [0, 1]; ξ describes the residual activation time of an enabled transition.

The initial CID is CID<sup>0</sup> = (M0; ξ0) with ξ0(t) = 1 for all t with M0[t . One writes (M, ξ)[η (M- , ξ- ) if one of the following cases applies:


A timed execution path π is an alternating sequence of CIDs and elements t ∈ T or <sup>r</sup> <sup>∈</sup> <sup>+</sup>, defined just like an execution path in Def. 6. Let <sup>ζ</sup>(π) <sup>∈</sup> ∪ {∞} be the sum of all time steps in a timed execution path π, the duration of π.

<sup>A</sup> timed test is a pair (<sup>T</sup> , D) of a test <sup>T</sup> and a duration <sup>D</sup> <sup>∈</sup> <sup>+</sup> <sup>0</sup> . A net must pass a timed test (T , D), notation N **must** (T , D), if each timed execution path <sup>π</sup> with <sup>ζ</sup>(π) > D contains a transition labelled <sup>w</sup>. Write <sup>N</sup> timed must N if N **must** (T , D) implies N**must** (T , D) for each timed test (T , D).

Vogler shows that the preorder timed must is strictly finer than <sup>J</sup> . In fact, although τ.a.**<sup>0</sup>** <sup>≡</sup><sup>J</sup> a.**0**, one has τ.a.**<sup>0</sup>** <sup>≡</sup>timed must a.**0**, since only the latter process must pass the timed test (a.w, 2). Here I use that each of the actions τ , a and w may take up to 1 unit of time to occur. A statement <sup>N</sup> timed must N says that N is faster than N, in the sense that composed with a test it is guaranteed to reach success states in less time than N.

Here I show that when abstracting from the quantitative dimension of timed must-testing, it exactly characterises <sup>J</sup> .

**Definition 11** A net must eventually pass a test <sup>T</sup> if there exists a <sup>D</sup> <sup>∈</sup> <sup>+</sup> 0 such that <sup>N</sup> **must** (<sup>T</sup> , D). Write <sup>N</sup> ev. must <sup>N</sup> if when N must eventually pass a test T , then so does N- .

**Theorem 3** Let N,N be finitely branching safe nets. Then <sup>N</sup> ev. must <sup>N</sup> iff <sup>N</sup> <sup>J</sup> <sup>N</sup>- .

A proof can be found in [22, Appendix D].

## **10 Conclusion**

The just failures preorder <sup>J</sup> was introduced by Walter Vogler [43] in 2002. Since then it has not received much attention in the literature, and has not been used as the underlying semantic principle justifying actual verifications. In my view this can be seen as a fault of the subsequent literature, as <sup>J</sup> captures exactly what is needed—no more and no less—for the verification of safety and liveness properties of realistic systems.

I substantiate this claim by pointing out that <sup>J</sup> is the coarsest preorder preserving safety and liveness properties when assuming justness, that is a congruence for basic process algebra operators, such as the partially synchronous parallel composition, abstraction from internal actions, and renaming. As argued in [25,18,24,17], justness is better motivated and more suitable for applications than competing completeness criteria, such as progress or the many notions of fairness surveyed in [24].

Moreover, I adapt the well-known must-testing preorder of De Nicola & Hennessy [9], by using justness as the underlying completeness criterion, instead of

g

<sup>≡</sup>Pr must g

the traditional choice of progress. By showing that the resulting must-testing preorder -J must coincides with -<sup>J</sup> I strengthen the case that this is a natural and fundamental preorder.

This conclusion is further strengthened by my result that it also coincides with a qualitative version ev. must of the timed must-testing preorder timed must of Vogler [43]. (Although timed must and -<sup>J</sup> stem from the same paper [43], this connection was not made there.)

All this was shown in the setting of Petri nets extended with read arcs, and therefore also applies to the settings of standard process algebras such as CCS, CSP or ACP. Since I cover read arcs, it also applies to process algebras enriched with signalling, an operator that extends the expressiveness of standard process algebras and is needed to accurately model mutual exclusion. I leave it for future work to explore these matters for probabilistic models of concurrency, or other useful extensions.

**Fig. 7.** A spectrum of testing preorders and bisimilarities preserving liveness properties

Fig. 7 situates -J must w.r.t. the some other semantic preorders from the literature. The lines indicate inclusions. Here -Pr must, may and should are the classical must-, may- and should-testing preorders from [9] and [5,36]—see Def. 9—and -Pr reward is the reward-testing preorder introduced by me in [19]. The failuresdivergences preorder of CSP [6,42], defined in a similar way as -J must, coincides with -Pr must [9,19]. ↔ denotes the classical notion of strong bisimilarity [34], and ↔ep, ↔sp are essentially the only other preorders (in fact equivalences) that preserve linear time properties when assuming justness: the *enabling preserving bisimilarity* of [26] and the *structure preserving bisimilarity* of [16].

The inclusions follow directly from the definitions—see refs. —and counterexamples against further inclusions appear below.

## **References**


in Informatics (LIPIcs), vol. 203. Schloss Dagstuhl–Leibniz-Zentrum f¨ur Informatik (2021). https://doi.org/10.4230/LIPIcs.CONCUR.2021.33, https:// arxiv.org/abs/2108.00142


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Model and Program Repair via Group Actions**

Paul C. Attie<sup>1</sup> and William L. Cocke1()

School of Computer and Cyber Sciences, Augusta University, Augusta, GA, USA pattie@augusta.edu, wcocke@augusta.edu

**Abstract.** Given a textual representation of a finite-state concurrent program P, one can construct the corresponding Kripke structure M. However, the size of M can be exponentially larger than the textual size of P. This state explosion can make model checking properties of P via M expensive or even infeasible. The action of a symmetry group G on M can be used to produce a smaller Kripke structure M. Various authors have exploited the direct correspondence between M and M to perform model checking. When the structure M does not satisfy a formula, one can look for a substructure that will satisfy the formula. We call this substructure-repair : identifying a substructure N of M that satisfies a given temporal logic formula.

In this paper we extend previous work by showing that repairs of M lift to repairs of M. In other words, we can repair a computer program P, which exhibits a high degree of symmetry, by repairing the smaller Kripke structure M and then symmetrizing the corresponding program. To do this we arrange the substructures of M and M into substructure lattices that are ordered by substructure inclusion. We show that the substructures of M preserved by G form a (sub)lattice that maps to the substructure lattice of M. When restricted to the lattice of substructures of M that are "maximal" with the action of G on M, the above map is a lattice isomorphism.

These results enable us to repair M and then to lift the repair to M. In cases where a program has a high degree of symmetry, such as in many concurrent programs, we can repair the program by repairing the small Kripke structure M.

**Keywords:** Model checking · symmetry reduction · model repair

## **1 Introduction**

To model check a program P, one first constructs a Kripke structure <sup>M</sup>. In general, the Kripke structure <sup>M</sup> is generated by all potential executions of P. The model checking problem for a program P w.r.t. a temporal logic formula ϕ is to verify that the Kripke structure <sup>M</sup> generated by the execution of P satisfies ϕ [8]. A major obstacle to model checking a concurrent program via its Kripke structure is *state explosion*: in general, the size of M is exponential in the number of processes n. As studied by Emerson and Sistla [18] and extended by others [10,14,21], the use of *symmetry reduction* to ameliorate state-explosion can yield a significant reduction in the complexity of model checking M |<sup>=</sup> ϕ when both <sup>M</sup> and ϕ have a high degree of symmetry in the process index set {1,...,n}.

For a Kripke structure M, we capture the symmetry of M using the group G of automorphisms of both <sup>M</sup> and ϕ. The quotient structure <sup>M</sup> <sup>=</sup> <sup>M</sup>/G of <sup>M</sup> by G often has significantly fewer states than <sup>M</sup>. Since <sup>M</sup> can be computed directly from the original P, we avoid the expensive computation of the large structure <sup>M</sup>. Model checking M |<sup>=</sup> <sup>ϕ</sup> is linear in the size of <sup>M</sup> [8], so this provides significant savings if <sup>M</sup> is small, i.e., if <sup>G</sup> is large.

If M -<sup>|</sup><sup>=</sup> f, then we can search for a model <sup>N</sup> related to <sup>M</sup> such that N |<sup>=</sup> f. In this paper we focus on substructure-repair : we require <sup>N</sup> to be a substructure of M. The key idea behind substructure-repair is to remove execution paths which violate required properties, e.g., paths that lead to a violation of mutual exclusion. We give examples in Section 6 of different properties and substructure repairs with respect to these properties. Substructure-repairs can always repair M w.r.t. all universal properties (those expressible using universal path quantification [26]).<sup>1</sup>

#### **1.1 Our Contributions**

We present a theory of substructures of Kripke structures. Using this theory we establish an evaluation preserving correspondence between certain substructures of the original Kripke structure M and the substructures of the quotient structure M (this is Theorem 2). This correspondence is a functorial form of bisimilarity between a certain lattice of substructures of M and the lattice of substructures of <sup>M</sup>. Hence for a given formula <sup>ϕ</sup>, substructure-repairs of <sup>M</sup> with respect to ϕ can be lifted to substructure-repairs of <sup>M</sup> with respect to ϕ (this is Theorem 3). This correspondence of Kripke substructures lattices is of independent mathematical interest as an example of a monotone Galois connection.

We build on our theory to extend group theoretic model checking to concurrent program repair : given a concurrent program P that may not satisfy ϕ, modify P to produce a program that does satisfy ϕ. Given P, ϕ, and a group G that acts on both P and ϕ, our method directly computes the quotient <sup>M</sup>/G (following [18]), then repairs <sup>M</sup>/G, using the algorithm of [2], and finally, extracts a correct program from the repaired structure.

The rest of the paper proceeds as follows: Section 3 contains the formal definition of Kripke structures and substructures. In Section 4, after briefly recalling group actions, we show how one can use a group to obtain a quotient M of M and the repair correspondence between M and M. We extend our results to the repair of concurrent programs in Section 5. Section 6 presents some examples. In particular, we show that a structure M might have a nonempty repair even

<sup>1</sup> Existential path properties could be dealt with by first adding sufficient transitions to M so that the augmented structure now contains the desired paths. One can then perform substructure-repair so that universal path properties are also satisfied.

if the quotient M does not. In Section 7 we examine what classes of Kripke structures and what types of formulae guarantee the existence of quotient based repairs.

## **2 Related Work**

Our work combines model/program repair [5,25,29,32] and symmetry reductions via group actions [7, 10, 16, 18–22]. Le Goues et al. [25] provides a modern introduction to program repair; although their results generally relate to program repair based on the textual representation of the program. Our approach repairs a Kripke structure w.r.t. a computation tree logic (CTL) formula and uses that to repair the corresponding program.

### **2.1 Computation Tree Logic Repair**

Buccafuri et. al. [5] posed the repair problem for CTL and solved it using abductive reasoning to generate repair suggestions that are verified by model checking. Jobstmann et. al. [29] and Staber et. al. [32] used game-based repair methods for programs and circuits, although their method is complete for invariants only.

Chatzieleftheriou et. al. [6] repair abstract structures, using Kripke modal transition systems and 3-valued CTL semantics. Von Essen and Jobstmann [23] present a game-based repair method which attempts to keep the repaired program close to the original faulty program, by also specifying a set of traces that the repair must leave intact.

The work of Attie et al. [2] establishes that repair by abstraction can avoid state explosion. However, repairs of abstracted structures do not always lift to repairs of the original structure. Within networks, Namjoshi and Trefler [30] have shown that a combination of abstraction and group actions can be used to produce smaller structures.

#### **2.2 Group theoretic model checking**

Group theoretic approaches to symmetry-reduction in model checking began in 1995 with work by Emerson and a collection of coauthors [7, 10, 14, 16, 18–22] compute the quotient M/G and model check M/G, instead of the original (much larger) structure M. The group theoretic approach to model checking works because M and M/G are bisimilar with respect to certain formulae.

A requirement for group theoretic model checking or repair is calculating the group of symmetries in question. We will see that larger groups of symmetries result in smaller quotient models. Clarke et al. [7] showed that calculating the orbit of a group action, a part of model checking via symmetry, is at least as difficult as graph isomorphism. However, in many practical cases concurrent programs have a natural symmetry by swapping certain processes. Hence many concurrent programs have a small known symmetry group in advance. Donaldson and Miller [11] showed that there is a process to build a larger symmetry group for a program from a smaller symmetry group.

A related approach is the use of structural methods to express symmetric designs, e.g., parameterized systems, where processes are all instances of a common template (possibly with a distinguished controller process) [1, 9, 24], and rings of processes, where all communication is between a process and its neighbors in the ring [9, 15, 17].

# **3 Temporal Logic and Kripke Structures**

Computation tree logic (CTL) is a propositional branching-time temporal logic used to model the possible computational branches taken by a system [12, 13]. The semantics of CTL are defined with respect to a Kripke structure.

**Definition 1 (Kripke structure).** *<sup>A</sup> Kripke structure* <sup>M</sup> *is a tuple* (S, S<sup>0</sup>, T, L, AP) *where* <sup>S</sup> *is a finite set of states,* <sup>S</sup><sup>0</sup> <sup>⊆</sup> <sup>S</sup> *is a set of initial states,* T <sup>⊆</sup> (S <sup>×</sup> S) *is a transition relation,* AP *is a finite set of atomic propositions, and* L : S <sup>→</sup> <sup>2</sup>AP *is a labeling function that associates each state* <sup>s</sup> <sup>∈</sup> <sup>S</sup> *with a subset of atomic propositions, namely those that hold in state* s*.*

We require that <sup>M</sup> be total: <sup>∀</sup>s <sup>∈</sup> S, <sup>∃</sup>t <sup>∈</sup> S : (s, t) <sup>∈</sup> T, and that S <sup>=</sup> <sup>∅</sup> implies <sup>S</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup>. Also, different states have different labels: <sup>s</sup> <sup>=</sup> <sup>t</sup> <sup>⇒</sup> <sup>L</sup>(s) <sup>=</sup> <sup>L</sup>(t). We admit the empty Kripke structure, i.e., S <sup>=</sup> <sup>∅</sup>, due to mathematical necessity.

When referring to the constituents of <sup>M</sup> = (S, S<sup>0</sup>, T, L, AP), we write <sup>M</sup><sup>S</sup>, <sup>M</sup><sup>S</sup><sup>0</sup> , <sup>M</sup><sup>T</sup> , <sup>M</sup><sup>L</sup>, and <sup>M</sup>AP respectively. State <sup>t</sup> is a *successor* of state <sup>s</sup> in <sup>M</sup> iff (s, t) <sup>∈</sup> T. We will write s <sup>→</sup> t in this case. A path π in <sup>M</sup> is a (finite or infinite) sequence of states, π <sup>=</sup> s<sup>0</sup>, s<sup>1</sup>,..., such that <sup>∀</sup><sup>i</sup> <sup>≥</sup> 0:(s<sup>i</sup>, s<sup>i</sup>+1) <sup>∈</sup> <sup>T</sup>.

To model the behavior of a concurrent program P <sup>=</sup> P<sup>1</sup>,...,P<sup>n</sup>, we define a special type of Kripke structure: a *multiprocess Kripke structure* is one in which the set of atomic propositions AP is partitioned into disjoint subsets AP<sup>1</sup>, . . . , AP<sup>n</sup>, states have the form (s<sup>1</sup>,...,s<sup>n</sup>) and transitions T are partitioned into disjoint subsets <sup>T</sup><sup>1</sup>,...,T<sup>n</sup>. The set of atomic propositions "owned" by <sup>P</sup><sup>i</sup> is denoted by AP<sup>i</sup>: they can only be changed by P<sup>i</sup>, but can be read by other processes. The local state of <sup>P</sup><sup>i</sup> is written as <sup>s</sup><sup>i</sup>, and is labelled by the subset of AP<sup>i</sup> whose propositions are true in <sup>s</sup><sup>i</sup>. Then, the truth value of <sup>p</sup> <sup>∈</sup> AP<sup>i</sup> in global state (s<sup>1</sup>,...,s<sup>n</sup>) is given by its value in local state <sup>s</sup><sup>i</sup>. <sup>T</sup><sup>i</sup> gives the transitions of process <sup>P</sup><sup>i</sup>, which are denoted as s i <sup>→</sup> t. For state s = (s<sup>1</sup>,...,s<sup>n</sup>), define si <sup>=</sup> s<sup>i</sup>, and <sup>s</sup>↓<sup>i</sup> = (s<sup>1</sup>,...,s<sup>i</sup>−<sup>1</sup>, s<sup>i</sup>+1,...,s<sup>n</sup>). We then require <sup>s</sup>↓<sup>i</sup> <sup>=</sup> <sup>t</sup>↓<sup>i</sup> for every transition s i <sup>→</sup> <sup>t</sup>, i.e., transitions by <sup>P</sup><sup>i</sup> do not change atomic propositions of other processes.

A CTL formula ϕ is evaluated (i.e., is true or false) in a state s of a Kripke structure <sup>M</sup> [13]. We write <sup>M</sup>, s <sup>|</sup><sup>=</sup> ϕ when s is true in state s of structure <sup>M</sup>, and write M |<sup>=</sup> <sup>ϕ</sup> to abbreviate <sup>∀</sup>s<sup>0</sup> <sup>∈</sup> <sup>S</sup><sup>0</sup> : <sup>M</sup>, s<sup>0</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>, i.e., <sup>ϕ</sup> holds in all initial states of M. The formal definition of |=, proceeds by induction on the structure of CTL formulae [12, 13] and is omitted for space reasons.

*Example 1* (Example Box) The "Box" Kripke structure in Figure 1 has 4 states and transitions as shown. Its set of atomic propositions is empty, and so all states have empty labels, as indicated by "()". There is a natural group acting on this Kripke structure, i.e., the group generated by the action which exchanges the state **s1** with **s2**, and the state **t1** with **t2**.

The theory of substructures presented below is motivated by the concept of a substructure-repair of a structure M with respect to a formula f, i.e., a substructure N of M such that N |= f.

S*.*

**Fig. 1.** The Box Kripke structure.

**Definition 2 (Substructure,** <sup>≤</sup>**).** *Given Kripke structures* <sup>M</sup> *and* <sup>N</sup> *, we say that* <sup>N</sup> *is a substructure of* <sup>M</sup>*, denoted* N ≤M*, iff the following all hold:*

$$1.\ \mathcal{N}\_S \subseteq \mathcal{M}\_S.\ \ $$


For mathematical necessity in what follows, we allow for the 'empty' substructure. We do not, however, accept an empty substructure as a valid repair. It is immediate that ≤ is a reflexive partial order. Lemmas 1 and 2 below imply that the substructures of M can be regarded as a lattice, with join and meet operations as follows.

**Lemma 1.** *Let* <sup>M</sup> *be a Kripke structure and suppose that* <sup>N</sup> *and* <sup>N</sup> *are substructures of* <sup>M</sup>*. Then*

$$\mathcal{N} \vee \mathcal{N}' = (\mathcal{N}\_S \cup \mathcal{N}'\_S, \mathcal{N}\_{S\_0} \cup \mathcal{N}'\_{S\_0}, \mathcal{N}\_T \cup \mathcal{N}'\_T, \mathcal{M}\_L \upharpoonright (\mathcal{N}\_S \cup \mathcal{N}'\_S), \mathcal{M}\_{AP})$$

*is the smallest substructure of* <sup>M</sup> *containing both* <sup>N</sup> *and* <sup>N</sup> - *.*

Given a nonempty finite set X = {X0, X1,...,Xn} of substructures of M, we define the structure -X = X<sup>0</sup> ∨ X<sup>1</sup> ∨···∨ Xn.

**Lemma 2.** *Let* <sup>M</sup> *be a Kripke structure and suppose that* <sup>N</sup> *and* <sup>N</sup> *are substructures of* <sup>M</sup>*. Then there exists a largest substructure of* <sup>M</sup> *contained in both* <sup>N</sup> *and* <sup>N</sup> - *.*

**Definition 3 (Join, Meet of Substructures).** *Let* <sup>N</sup> *and* <sup>N</sup> *be two substructures of* <sup>M</sup>*. The join of* <sup>N</sup> *and* <sup>N</sup> - *, written* N ∨N - *, is the smallest substructure of* <sup>M</sup> *containing both* <sup>N</sup> *and* <sup>N</sup> - *. The meet of* <sup>N</sup> *and* <sup>N</sup> - *, written* N ∧N - *, is the largest substructure of* <sup>M</sup> *contained in both* <sup>N</sup> *and* <sup>N</sup> - *.*

The join N ∨N has a simple description as given in Lemma 1. However, the meet N ∧N - , while well-defined, does not have such a simple description. It is possible that for two substructures N and N of a Kripke structure M, there are no non-empty substructures contained in both N and N - . Hence the largest substructure contained in both N and N could be empty.

We can now define a lattice of substructures <sup>Λ</sup><sup>M</sup> for a given structure <sup>M</sup>.

**Definition 4 (Lattice of Substructures).** Given a Kripke structure M the *lattice of substructures of* <sup>M</sup> is <sup>Λ</sup><sup>M</sup> = ({N : <sup>N</sup> is a substructure of M} , <sup>≤</sup>) where the meet and join in Λ<sup>M</sup> are as given in Definition 3.

## **4 Quotient Structures**

We capture the symmetry in a Kripke structure M with the notion of statemapping: a graph isomorphism on M which preserves initial states. Statemappings also preserve paths since they are isomorphisms. We ignore for now the labelling function ML, i.e., which atomic propositions hold in which states, and concern ourselves only with the graph structure of M. Since the atomic proposition labelling obviously affects the truth of CTL formulae in states of <sup>M</sup>, it must be accounted for. We do this below using the notion of G-invariant CTL formula. Thus, we decompose the symmetry characerization of M into two separate concerns: the graph structure of M, handled using state-mapping, and the atomic proposition labelling of states of <sup>M</sup>, handled using G-invariant CTL formulae.

A type of symmetry of particular interest is the symmetry of a multiprocess Kripke structure w.r.t. the process indices 1,...,n of the corresponding concurrent program <sup>P</sup><sup>1</sup> ··· <sup>P</sup><sup>n</sup>, as we illustrate below. Our theory, however, applies to Kripke structures in general.

#### **4.1 Groups Acting on Kripke Structures**

**Definition 5.** A state-mapping of M is a graph isomorphism of the state-space of M such that its restriction to the initial states is also an isomorphism, i.e., takes initial states to initial states. Formally, for a Kripke structure M, a *statemapping* of <sup>M</sup> is a bijection <sup>f</sup> : <sup>M</sup><sup>S</sup> → M<sup>S</sup> such that:

$$\begin{array}{l} - \ f(\mathcal{M}\_{S\_0}) = \mathcal{M}\_{S\_0};\\ - \text{ For states } s, t \in \mathcal{M}\_S \text{ we have that } (s, t) \in \mathcal{M}\_T \iff (f(s), f(t)) \in \mathcal{M}\_T. \end{array}$$

The set of all state-mappings of M forms a group. This means that the composition of any two state-mappings is another state-mapping and for any statemapping f on <sup>M</sup> there is another state-mapping g on <sup>M</sup> such that f(g(s)) = s and g(f(s)) = s. We refer to the manuscripts by Issacs [27, 28], and Serre [31] for a more in-depth introduction to group theory.

**Definition 6 (**G**-closed).** For a group G of state-mappings of a Kripke structure <sup>M</sup>, a substructure <sup>N</sup> of <sup>M</sup> is called G*-closed* if G is a group of statemappings of <sup>N</sup> , i.e., for every <sup>g</sup> <sup>∈</sup> <sup>G</sup> and <sup>s</sup> ∈ N<sup>S</sup> we have <sup>g</sup>(s) ∈ NS.

**Lemma 3.** *Let* M *be a Kripke structure and let* G *be a group of state mappings of* M*. Let* N , N *be two* G*-closed substructures of* M*. Then* N ∨N *and* N ∧N - *are both* G*-closed.*

By Lemma 3, we see that the G-closed substructures of M form a sublattice of ΛM. This is a proper sublattice in that the meet and join operations are the same as those of ΛM.

**Definition 7 (Lattice of** G**-closed substructures).** *Given a Kripke structure* M *and a group* G *of state mappings of* M*, the poset of* G*-closed substructures of* <sup>M</sup> *forms a lattice. We call this the lattice of* <sup>G</sup>*-closed substructures of* <sup>M</sup> *and write it as* <sup>Λ</sup>M,G*.*

*Example 1 (Example Box).* Let M be Example Box, i.e., the Kripke structure presented in Figure 1. Let g be the map that simultaneously switches **s<sup>1</sup>** and **s2**, and switches **t<sup>1</sup>** and **t2**, i.e., g(**s1**) = **s2**, g(**s2**) = **s1**, g(**t1**) = **t2**, g(**t2**) = **t1**. Let G be the group consisting of g and the identity map on MS. We note that G is not the entire group of state-mappings of M. The structure M has 10 Gclosed substructures, including the empty structure. We present some of these structures in Figure 2.

**Fig. 2.** Four <sup>G</sup>-closed substructures of Example Box. Where <sup>G</sup> is the group generated by the simultaneous swapping of indexes of both the **<sup>s</sup><sup>i</sup>** and the **<sup>t</sup><sup>i</sup>**. Note that each of the structures is a substructure of the substructure to the right. Looking ahead to Definition 10, only the entire structure (d) is G-maximal.

#### **4.2 Constructing the Quotient structure**

Given a group G of state-mappings of a structure M, we want to construct a quotient structure M/G. However, as noted, state-mappings do not contain any information about ML. To remedy this situation, we need a function that assigns a representative to each orbit of G, where for s ∈ M<sup>S</sup> the orbit of s is {g(s) : g ∈ G}.

**Definition 8 (Representative map).** *Let* M *be a Kripke structure and suppose that* <sup>G</sup> *is a group of state-mappings of* <sup>M</sup>*. A representative map of* <sup>M</sup> *with respect to* G *is a function* ϑ<sup>G</sup> : M<sup>S</sup> → M<sup>S</sup> *satisfying the following:*


We define <sup>ϑ</sup>G(S) = {ϑG(s) <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>}.

**Definition 9 (Quotient structure).** *Given a Kripke structure* <sup>M</sup>*, a group* <sup>G</sup> *of state-mappings of* <sup>M</sup>*, and a representative map* <sup>ϑ</sup><sup>G</sup> *of* <sup>M</sup> *with respect to* <sup>G</sup>*, we define the quotient structure* <sup>M</sup> <sup>=</sup> <sup>M</sup>/(G, ϑG) *of* <sup>M</sup> *with respect to* <sup>G</sup> *and* ϑ<sup>G</sup> *as follows, where we write* s*,* t *for* ϑG(s)*,* ϑG(t)*, respectively:*


Thus *the states of a quotient structure correspond exactly to the orbits of states of the original structure* under the group of state mappings. For transitions, we have a slightly more subtle correspondence. Consider the following examples:

*Example 2.* In Figure 3 we demonstrate the correspondence between Kripke structures, G-closed substructures, and their quotients. In the figure, we present a multiprocess Kripke structure M corresponding to two concurrent processes P<sup>1</sup> (atomic propositions and transitions in blue) and P<sup>2</sup> (atomic propositions and transitions in red). The group G of state mappings swaps the indexes of the processes. This structure has a <sup>G</sup>-closed substructure <sup>N</sup> constructed by removing the 'center' state **<sup>u</sup><sup>0</sup>**. Define <sup>ϑ</sup><sup>G</sup> to take the 'left-most' state in the orbit, i.e., <sup>ϑ</sup>G(**t<sup>1</sup>**) = **t<sup>0</sup>**, <sup>ϑ</sup>G(**t<sup>5</sup>**) = **t<sup>2</sup>**, <sup>ϑ</sup>G(**u<sup>0</sup>**) = **u<sup>0</sup>**, <sup>ϑ</sup>G(**t<sup>6</sup>**) = **t<sup>3</sup>**, <sup>ϑ</sup>G(**t<sup>4</sup>**) = **t<sup>4</sup>**. The quotient structure <sup>M</sup>/(G, ϑG) appears in the top right. While the quotient structure is isomorphic to a substructure of M, this is not always the case. (See Figure 6 in Example 5 for an example where the quotient gains a new transition.) The quotient structure <sup>N</sup> /(G, ϑG|NS) appears in the bottom right.

*Example 3 (Example Box).* Let <sup>M</sup> and <sup>G</sup> be as in Example 1. Let <sup>ϑ</sup><sup>G</sup> be defined by <sup>ϑ</sup>G(**s<sup>1</sup>**) = <sup>ϑ</sup>G(**s<sup>2</sup>**) = **<sup>s</sup><sup>1</sup>** and <sup>ϑ</sup>G(**t<sup>1</sup>**) = <sup>ϑ</sup>G(**t<sup>2</sup>**) = **<sup>t</sup><sup>1</sup>**. Then the quotient structure <sup>M</sup>/(G, ϑG) has exactly 2 states, **<sup>s</sup><sup>1</sup>** and **<sup>t</sup><sup>1</sup>** with transitions (**s<sup>1</sup>**, **<sup>t</sup><sup>1</sup>**),(**t<sup>1</sup>**, **<sup>s</sup><sup>1</sup>**). Also, the <sup>G</sup>-closed substructure substructures of <sup>M</sup> given in Figure <sup>2</sup> (a), (b), and (c) also map to this quotient structure via N→N /(G, ϑG). Note that the transition (**t<sup>1</sup>**, **s<sup>1</sup>**) is present in the quotient, but is not present, for example, in the structure of Figure <sup>2</sup> (b). However, the "corresponding" transition (**t<sup>2</sup>**, **<sup>s</sup><sup>1</sup>**) is present in Figure 2 (b).

**Fig. 3.** As discussed in Example 2, we have a Kripke structure in the top left and a G-closed substructure in the bottom left. On the right, we have the quotients of the whole structure (top) and the G-closed substructure (bottom).

In the sequel, we fix a Kripke structure M, a group G of state mappings of <sup>M</sup>, and a representative map <sup>ϑ</sup>G of <sup>M</sup> with respect to <sup>G</sup>.

Example 3 shows that **many** G**-closed substructures can have the same quotient structure**, and also that, in general, a transition in the quotient may not itself be present in the original structure. We show, however, in Theorem 1 below that a "corresponding" transition is guaranteed to be present in the original structure. These corresponding transitions can be joined into a path which corresponds state-by-state to the path in the quotient. This "path correspondence" is what allows for model checking of M via model checking of M and is formalized in the following theorem from Emerson and Sistla [18, **3.1**].

**Theorem 1 (Path Correspondence Theorem).** *There is a bidirectional correspondence between paths of* M *and paths of* M*. Formally we have the following:*


We now extend the path correspondence between M and M to a correspondence between G-closed substructures of M and substructures of M. Define Ψ:Λ<sup>M</sup>,G <sup>→</sup> <sup>Λ</sup>M, by Ψ(<sup>N</sup> ) = <sup>N</sup> /(G, ϑG), so that Ψ maps a <sup>G</sup>-closed substructure N of M to a corresponding substructure of M. We call Ψ the *quotient map*. Ψ establishes a join-semilattice homomorphism between ΛM,G and Λ<sup>M</sup> as we now show in the following series of lemmas.

**Lemma 4.** For every substructure N of M, there is a G-closed substructure N of M such that N /(G, ϑG) = N .

Lemma 4 establishes that Ψ is surjective. We note that every substructure N of M defines a set of states of M, i.e., the orbits of the states in N . However, in general, the transitions of N do not uniquely define transitions in M.

The next lemma demonstrates that Ψ is a homomorphism of the joinsemilattices ΛM,G and ΛM. We note that it is not a homomorphism of the lattices themselves because the meet of two G-closed structures mapping might be empty.

**Lemma 5 (Quotient map respects join).** Let N , N -∈ ΛM,G. Then

$$
\Psi(\mathcal{N}\vee\mathcal{N}') = \Psi(\mathcal{N})\vee\Psi(\mathcal{N}').
$$

As seen in Example 3, it is possible for multiple G-closed substructures of M to map to the same substructure of the quotient structure M. To obtain a single well-defined preimage for each substructure of the quotient structure, we introduce the concept of G-maximal. Recall that the join of G-closed substructures of M is G-closed.

**Definition 10 (**G**-maximal).** A G-closed substructure N of M is G-maximal if

$$\mathcal{N} = \bigvee\_{\substack{\mathcal{N}' \in \Lambda\_{\mathcal{M},G} \\ \mathcal{N}'/(G,\vartheta\_G) \le \vec{\mathcal{N}}/(G,\vartheta\_G)}} \mathcal{N}'.$$

That is, N is the join of all G-closed substructures of M whose quotient is a substructure of the quotient of N itself, namely of N /(G, ϑG). A G-closed substructure N fails to be G-maximal exactly when there are states s, t ∈ N , such that (s, t) ∈ N /(G, ϑG), but (s, t) is not in N .

Among all of the G-closed substructures in Figure 2 only the entire structure itself is G-maximal

**Lemma 6.** Let M- ,M- be two G-maximal substructures of M. Then M-∨M-- is G-maximal and M- ∧ M-is G-maximal.

Lemma 6 allows us to make the following definition.

**Definition 11 (**G**-maximal lattice of substructures).** The set of Gmaximal substructures of M forms a sublattice ΛM,G−max of ΛM.

While in general the quotient map from ΛM,G to Λ<sup>M</sup> is always surjective, when restricted to ΛM,G−max, the map is injective and is a lattice isomorphism. **Theorem 2 (**G**-Maximal Lattice Correspondence).** The restriction of the quotient map <sup>Ψ</sup> to <sup>Λ</sup>M,G−max is an isomorphism from <sup>Λ</sup>M,G−max to <sup>Λ</sup>M, i.e., between the lattice of G-maximal substructures of M and the lattice of structures of M.

At this point, we would like to remind the reader of the various lattices that we have defined and how they relate to each other:

G-maximal substructures - <sup>Λ</sup>M,G−max <sup>⊆</sup> <sup>G</sup>-closed substructures - <sup>Λ</sup>M,G ⊆ All substructures - <sup>Λ</sup><sup>M</sup> .

### **4.3 Semantic Relationships Between Structures and Quotient Structures**

**Definition 12.** Let G be a group of state mappings of M. A CTL formula ϕ is G*-invariant* over M, if for every state s, every g ∈ G, for all maximal propositional subformulae ϕof ϕ, we have

$$\mathcal{M}, s \mid = \varphi' \iff \mathcal{M}, g(s) \mid = \varphi'.$$

**Lemma 7.** If ϕ is G-invariant, then the valuation of ϕ in M does not depend on the choice of representative map ϑG.

This allows us to connect semantic statements about M with semantic statements about M for formulae that are G-invariant. The path correspondence theorem establishes a bisimulation between M and M, in which state s of M and state s of M are bisimilar iff s is in the orbit of s, i.e., s = g(s) for some g ∈ G. We call such a bisimulation a G-bisimulation. Hence, G-bisimilar states satisfy the same propositional subformulae of any G-invariant CTL formula ϕ. A straightforward induction over path length then shows that s and s satisfy the same G-invariant CTL formulae:

**Corollary 1.** M |= ϕ iff M |= ϕ for all G invariant CTL formulae ϕ.

**Lemma 8.** Let s ∈ M<sup>S</sup>, t ∈ M<sup>S</sup>. Let ϕ be a G-invariant CTL formula. If t = ϑG(s), then M, s |= ϕ ⇐⇒ M, t |= ϕ.

Section 3 developed the theory of substructures of a Kripke structure. This development was motivated by the following definition and theorem.

**Definition 13 (Substructure-Repair).** Given a structure M and a CTL formula ϕ, we call a nonempty substructure N of M a *substructure-repair* of M with respect to ϕ if N |= ϕ.

If a CTL formula ϕ is G-invariant, then the lattice correspondence will respect the valuation of ϕ.

**Theorem 3 (Repair Correspondence).** Let ϕ be a G-invariant CTL formula. Let N be a non-empty G-closed substructure of M, s ∈ N<sup>S</sup>, and N = N /(G, ϑG). Then N , s |= ϕ ⇐⇒ N , ϑG(s) |= ϕ.

## **5 Repair of Concurrent Programs**

A concurrent program P = P<sup>1</sup> - ... - P<sup>n</sup> consists of n sequential processes executing in parallel. Each process P<sup>i</sup> is a set of i-actions (si,B, ti), where si, t<sup>i</sup> are local states of P<sup>i</sup> and B is a guard (a predicate on the global state). We say action when we ignore the process id. We assume a given set S<sup>0</sup> of initial states. The program P<sup>1</sup> -···- P<sup>n</sup> generates a transition s i → t iff P<sup>i</sup> contains an action (si,B, ti) such that si = si, ti = ti, and s(B) = true, where s(B) is the value of guard B in global state s. The transition updates only atomic propositions in APi, and so s↓i = t↓i. The state-transition graph of P is the closure of this "transition generation" operation, starting in the initial state set S0.

Given a concurrent program P and a CTL formula ϕ, we wish to modify P to produce a repaired program <sup>P</sup><sup>r</sup> such that <sup>M</sup>- |= ϕ, where M is the statetransition graph of P<sup>r</sup>. The modification is "subtractive", that is, it only removes behaviors and does not add them. We assume henceforth that when M is a multiprocess Kripke structure over process indices 1,...,n, that the symmetry group G is a subgroup of Sn, the group of permutations on {1,...,n}.

## **5.1 Repair of Symmetry-reduced Structures**

We first generate the symmetry-reduced state transition graph M of P. We use the algorithm of Emerson and Sistla [18, Figure 1]. We then apply the model repair algorithm of Attie et. al. [2] to M, and the specification ϕ of P. This algorithm is sound and complete, so that if M has some substructure that satisfies ϕ, then the algorithm will return such a substructure N . If not, the algorithm will report that no repair exists. As noted, applying this algorithm to the symmetryreduced state transition graph is only complete with respect to the symmetric repairs, see Example 6.3.

## **5.2 Extraction of Concurrent Programs from Symmetry-reduced Structures**

We want to extract a repaired concurrent program from N using the projection method of [4,13]: each transition s i → t is turned into an i-action action(s i <sup>→</sup> <sup>t</sup>) (si, B, t<sup>i</sup>), with guard <sup>B</sup> <sup>=</sup> {|s|} where {|s|} "(- <sup>Q</sup>∈NL(s) <sup>Q</sup>) <sup>∧</sup> ( - <sup>Q</sup>∈NL(s) <sup>¬</sup>Q)" and Q ranges over AP. When process i is in local state si, guard B checks that the current global state is actually s.

A key problem is that the definition of the quotient M allows transitions in which the atomic propositions of more than one process are changed, since any representative of an orbit can be chosen. Hence the repaired N ≤ M can also contain such transitions, e.g., the transition from **S6** to **S1** in Figure 6 below, which we write as [C<sup>1</sup> T2] → [T<sup>1</sup> N2]. Note that the propositions of both processes 1 and 2 are changed. To generate i-actions, such transitions must be converted so that only the atomic propositions of a single process are modified.

Define a transition from s to t to be regular iff it modifies atomic propositions in at most one APi, so that s↓i = t↓i for some process index i, and write the transition as s i → t. Also define a transition from s to t to be *irregular* iff it is not regular, i.e., it modifies atomic propositions in more than one APi, and write the transition as s → t, with no process index labelling the arrow.

For each irregular transition s → t ∈ N <sup>T</sup> , there is g- ∈ G such that s → g- (t) is regular. Such an element g always exists. Let s → t ∈ M<sup>T</sup> for arbitrary M<sup>T</sup> . By Definition 9, there exists s → t ∈ M<sup>T</sup> such that s = ϑG(s) and t = ϑG(t). Hence there is some g ∈ G such that g(s) = s since s and s are in the same orbit. Since g is a symmetry of M, we have g(s) → g(t) ∈ M<sup>T</sup> . Hence s → g(t) ∈ M<sup>T</sup> . Now t = h(t) for some h ∈ G since t and t are in the same orbit. Hence s → g(h(t)) ∈ M<sup>T</sup> , and so the needed g is the product of g and h. For example, by applying the permutation of process indices 1, 2 to [T<sup>1</sup> N2], from the irregular transition [C<sup>1</sup> T2] → [T<sup>1</sup> N2] we extract the regular transition [C<sup>1</sup> T2] <sup>1</sup> → [N<sup>1</sup> T2].

Define *Reg*i(N <sup>T</sup> ) to be the set of regular transitions s i → g(t) such that g ∈ G and s → t ∈ N <sup>T</sup> . Since g can be the identity element of G, it follows that this account for both regular and irregular transitions in N <sup>T</sup> . Define *Act* <sup>i</sup>(N <sup>T</sup> ) = {action(s i → t) | s i → t ∈ *Reg*i(N <sup>T</sup> )}, be the set of actions obtained from *Reg*i(N <sup>T</sup> ).

Define the action of g ∈ G on syntactic elements of P<sup>i</sup> as follows. For local state si: g(si) = s<sup>g</sup>(i). For atomic proposition Qi: g(Qi) = Q<sup>g</sup>(i). For guard B, by induction: g(¬B) = ¬g(B) and g(B1∧ B2) = g(B1)∧g(B2), with the base case given by g(Qi) above. For i-action (si,B, ti): g(si,B, ti)=(g(si), g(B), g(ti)). That is, we apply g to all process indices in the syntactic element. Now define *Act*<sup>G</sup> <sup>i</sup> (<sup>N</sup> <sup>T</sup> ), the symmetrization of *Act* <sup>i</sup>(<sup>N</sup> <sup>T</sup> ), by *Act*<sup>G</sup> <sup>i</sup> (N <sup>T</sup> ) = {g(a) | g ∈ G, a ∈ *Act* <sup>j</sup> (N <sup>T</sup> ), g(j) = i}. The repaired concurrent program arises from process-wise repair P <sup>G</sup> = P <sup>G</sup> <sup>1</sup> ···<sup>P</sup> <sup>G</sup> <sup>n</sup> , where <sup>P</sup> <sup>G</sup> <sup>i</sup> consists of the i-actions in *Act*<sup>G</sup> <sup>i</sup> (N <sup>T</sup> ).

**Theorem 4.** *Let* <sup>P</sup> <sup>G</sup> *be the concurrent program extracted from* <sup>N</sup> *as above, let* <sup>N</sup> <sup>p</sup> *be the state transition graph generated by the execution of* <sup>P</sup> <sup>G</sup>*, and let* <sup>N</sup> <sup>p</sup> <sup>=</sup> <sup>N</sup> <sup>p</sup>/(G, ϑG)*. Then* <sup>N</sup> <sup>p</sup> *is* <sup>G</sup>*-closed and* <sup>N</sup> <sup>p</sup> <sup>=</sup> <sup>N</sup> *.*

**Corollary 2.** *Let* <sup>P</sup> <sup>G</sup> *be the repaired program and* <sup>ϕ</sup> *the CTL specification that was used to repair* <sup>M</sup>*, resulting in* <sup>N</sup> *. Then* <sup>P</sup> <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>*.*

## **6 Examples**

#### **6.1 Two process Mutual Exclusion**

We consider mutual exclusion for two processes P1, P2. Each P<sup>i</sup> has three local states: N<sup>i</sup> (neutral, computing locally), T<sup>i</sup> (trying, has requested critical section entry), and C<sup>i</sup> (in the critical region). We start with the "trivial" program P shown in Figure 4 in which all action guards are "true" and apply the program repair algorithm of Section 5 to repair this program w.r.t. the specification ϕ = AG¬(C1∧C2)∧AG((T1∨T2) ⇒ AF(C1∨C2)). The first conjunct specifies mutual exclusion of the critical sections (safety) and the second specifies progress: if some process requests the critical section then some process will obtain it (liveness). Figure 5 (left side) shows the Kripke structure M generated by execution of P. Transitions of P1, P<sup>2</sup> are shown in blue, red, respectively. Clearly, M -|= ϕ. Actually both conjuncts are violated: AG¬(C<sup>1</sup> ∧ C2) due to the reachability of state **S8** from the initial state, and AG((T<sup>1</sup> ∨T2) ⇒ AF(C<sup>1</sup> ∨ C2)) due to the self loop on state **S4**.

M has exactly two symmetries: the identity map, and the map that swaps process indices 1 and 2. Our program repair algorithm does not generate M since M may be large, and we show M only for exposition. We generate M = M/(G, ϑG) directly from P, and we show M in Figure 5 (right side). M has a transition (shown in black) from state **S6** to **S1**, which is the quotient of the transition from **S6** to **S2** in M, i.e., ϑG(**S6**) = **S6** and ϑG(**S2**) = **S1** so the edge (ϑG(**S6**), ϑG(**S2**)) occurs in M.

Figure 6 shows the repair N of the reduced structure M, and the resultant lifting of the repair to M. The deleted transitions and states are shown dashed. Figure <sup>7</sup> shows the repaired concurrent program <sup>P</sup> <sup>G</sup> that is extracted from <sup>N</sup> . Note that <sup>⊕</sup> means disjunction [3]. By Corollary 2, <sup>P</sup> <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

**Fig. 4.** Initial incorrect mutual exclusion program from Section 6.1.

#### **6.2** *n***-Process Mutual Exclusion**

We now consider mutual exclusion for n-processes. To reduce clutter, we remove the trying **Ti** state, and we give a concrete example for 3 processes — the generalization to n processes is straightforward. Each process can move directly from N to C with the appropriate indexes, i.e., the guards on all actions are initially "true", just like in Figure 4.

We consider the mutual exclusion specification - i-<sup>=</sup><sup>j</sup> AG¬(Ci∧C<sup>j</sup> ). The group of state mappings G for both structure and specification is the full permutation group on the indices {1,...,N}. For N-processes, we have that the quotient model by the full group of symmetries has N + 1 states, while the original model would have 2<sup>N</sup> states. Figure <sup>8</sup> shows the repair of the quotient <sup>M</sup> and then

**Fig. 5.** The original model M and quotient M = M/(G, ϑG) for the Kripke structures in Section 6.1.

**Fig. 6.** The repair of M and the lifting of the repair to M from Section 6.1.

**Fig. 7.** The mutual exclusion concurrent program extracted from M in Figure 6.

the lifting of the repair to the original structure M. Figure 9 shows the correct (repaired) program P <sup>G</sup> that is extracted from the repaired quotient in Figure 8. For N processes, the guard on actions of P <sup>G</sup> <sup>i</sup> is - j-<sup>=</sup><sup>i</sup> <sup>N</sup><sup>j</sup> .

#### **6.3 No** *G***-closed Repairs**

Consider the structure in Figure 10 and the formula f = AXAXAXP. The structure <sup>M</sup> has a single initial state. Let <sup>G</sup> be the group consisting of the identity and the map swapping S1 and S2. In Figure 10 we see that the quotient structure <sup>M</sup>/(G, ϑG) does not have any nonempty repairs with respect to <sup>f</sup>. But, <sup>M</sup> does contain a substructure <sup>N</sup> that satisfies <sup>f</sup>.

## **7 Relative Completeness of Group Theoretic Repair**

By the Repair Correspondence (Theorem 3), the existence of a repair N of M implies the existence of a repair N of M. In Example 6.3, we gave an example in which a repair <sup>N</sup> of <sup>M</sup> exists but no <sup>G</sup>-closed repair does, i.e., <sup>M</sup> has no repairs. This leads us to ask: is there a fragment of CTL, and/or a class of Kripke structures, for which group theoretic repair is complete? That is, the existence of a repair (substructure <sup>N</sup> of <sup>M</sup> that satisfies <sup>ϕ</sup>) implies the existence of a <sup>G</sup>-closed repair (substructure <sup>N</sup> of <sup>M</sup> that satisfies <sup>ϕ</sup>).

One attempt to answer this question is to examine formulae and structures where substructures are equivalent to the smallest G-closed substructure containing them. Assume there exists N ≤M such that N |<sup>=</sup> <sup>ϕ</sup>. Write <sup>N</sup> <sup>G</sup> for the smallest <sup>G</sup>-closed structure that contains <sup>N</sup> . We call <sup>N</sup> <sup>G</sup> the <sup>G</sup>-closure of <sup>N</sup> in <sup>M</sup>. If <sup>N</sup> <sup>G</sup> is bisimilar to <sup>N</sup> , then <sup>N</sup> <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> and <sup>N</sup> <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> which is a substructure of M.

In [14], Emerson et al., give a criteria for a structure M to be bisimilar to the symmetrized structure <sup>M</sup><sup>G</sup>, their criteria is: for any transition (s, t) <sup>∈</sup> M<sup>G</sup> T , there must be a <sup>g</sup> <sup>∈</sup> <sup>G</sup> such that (s, gt) ∈ M<sup>T</sup> . When asking about substructures, it is not clear what criteria on M is needed to ensure that each substructure N of <sup>M</sup> is bisimilar to <sup>N</sup> <sup>G</sup>.

**Definition 14 (**G**-Repair Complete).** *Let* <sup>M</sup> *be a Kripke structure with a group of state mappings* <sup>G</sup> *and* <sup>ϕ</sup> *<sup>a</sup>* <sup>G</sup>*-invariant* CTL *formula. Let* N ≤M *be any repair of* <sup>M</sup> *with respect to* <sup>ϕ</sup>*, and let* <sup>s</sup> *be any state in* <sup>N</sup>S*. Then the pair* (M, ϕ) *is* <sup>G</sup>*-repair complete if:* <sup>N</sup> , s <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *implies for all* <sup>g</sup> <sup>∈</sup> <sup>G</sup>*, we have* <sup>N</sup> <sup>G</sup>, g(s) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>*.*

It is clear that propositional formulae are always G-repair complete. In addition we note the following:

**Theorem 5.** *If* ϕ *and* ψ *are purely propositional formulae then for any Kripke structure* <sup>M</sup>*, the pair* (M, <sup>A</sup>[<sup>ϕ</sup> <sup>R</sup> <sup>ψ</sup>]) *is* <sup>G</sup>*-repair complete.*

There exists structures <sup>M</sup> and ϕ, ψ formulae such that (M, ϕ) <sup>G</sup>-repair complete, and (M, ψ) <sup>G</sup>-repair complete, but (M, ϕ∧ψ) not <sup>G</sup>-repair complete.

**Fig. 8.** The Kripke structure defined in Section 6.2. On the left is the repair of M and the lifting of the repair to M appears on the right.

**Fig. 9.** The repaired program P <sup>G</sup> for the program in Section 6.2.

**Fig. 10.** The models from Section 6.3 from left to right: the model M, the quotient of M, a repair of M with respect to f = AXAXAXP that is not G-closed.

*Example 4.* Let M be the Kripke structure described by Figure 11. Let G be the group of state mappings generated by swapping s1 and s2. Let ϕ = A[p R q] and ψ = AF -= q. The structure M has a nonempty G-closed repair for ϕ. Similarly there is a single nonempty G-closed repair for ψ. But M has no G-closed repairs of ϕ ∧ ψ.

**Fig. 11.** The Kripke Structure from Example 4 (note that (b, s0) is a transition, while (b, r) is not) (left), G-closed repairs of M with respect to the formulae A[p R q] (center), and AF¬q (right).

## **8 Conclusions**

We present a theory of how group actions could be used to assist in the repair of a Kripke structure.

We presented a theory for the substructures of a given Kripke structure M, their organization into lattices, and how these substructures interact with a group of state-mappings of M. We show a lattice isomorphism between substructure-repairs of M and G-maximal repairs of M (Theorem 3: Repair Correspondence). This monotone Galois correspondence guarantees that a repair of M lifts to a repair of M: that is to say that model repairs of M with respect to a G-invariant CTL formula ϕ lift to model repairs of M with respect to ϕ. Using this theory we were able to devise a method for repairing concurrent programs which exploits this correspondence, thus avoiding state explosion. We construct the quotient structure M directly from P without the need to construct the structure M. By our correspondence, repairing M will lift to a repair of the structure M, which in turn corresponds to a repair of P. We show how to construct a repair of P using the repair of M while circumventing the creation of the larger Kripke structure.

A Kripke structure M that can be repaired with respect to a formulae ϕ can be repaired via abstraction. However, not every repair of an abstracted structure N corresponds to a repair of M. In contrast, the structure might not be repairable using the quotient structure, but any repair of the quotient structure will lift to a repair of the original structure.

## **References**


### 540 P. C. Attie and W. L. Cocke

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Subgame Optimal Strategies in Finite Concurrent Games with Prefix-Independent Objectives**

Benjamin Bordais() , Patricia Bouyer and St´ephane Le Roux

Universit´e Paris-Saclay, CNRS, ENS Paris-Saclay, LMF, 91190 Gif-sur-Yvette, France bordais@lsv.fr

**Abstract.** We investigate concurrent two-player win/lose stochastic games on finite graphs with prefix-independent objectives. We characterize subgame optimal strategies and use this characterization to show various memory transfer results: 1) For a given (prefix-independent) objective, if every game that has a subgame *almost-surely winning* strategy also has a positional one, then every game that has a subgame *optimal* strategy also has a positional one; 2) Assume that the (prefixindependent) objective has a neutral color. If every *turn-based* game that has a subgame almost-surely winning strategy also has a positional one, then every game that has a *finite-choice* (notion to be defined) subgame optimal strategy also has a positional one.

We collect or design examples to show that our results are tight in several ways. We also apply our results to B¨uchi, co-B¨uchi, parity, mean-payoff objectives, thus yielding simpler statements.

## **1 Introduction**

Turn-based two-player win/lose (stochastic) games on finite graphs have been intensively studied in the context of model checking in a broad sense [19,1]. These games behave well regarding optimality in various settings. Most importantly for this paper, [14] proved the following results for finite turn-based stochastic games with prefix-independent objectives: (1) every game has deterministic optimal strategies; (2) from every value-1 state, there is an optimal, i.e. almost-surely winning, strategy; (3) if from every value-1 state of every game there is an optimal strategy using some fixed amount of memory, every game has an optimal strategy using this amount of memory. These results are of either of the following generic forms:


The concurrent version of these turn-based (stochastic) games has a higher modeling power than the turn-based version: this is really useful in practice since real-world systems are intrinsically concurrent [15]. They are played on a finite graph as follows: at each player state, the two players stochastically and independently choose one among finitely many actions. This yields a Nature state,

which stochastically draws a next player state, from where each player chooses one action again, and so on. Each player state is labelled by a color, and who wins depends on the infinite sequence of colors underlying the (stochastically) generated infinite sequence of player states. Unfortunately, these concurrent games do not behave well in general even for simple winning conditions and simple graph structures, like finite graphs:


In this paper, we focus on concurrent stochastic finite games. Therefore, the generic forms of our results will be more complex, in order to take into account the above-mentioned discrepancies. They will somehow be given as generic statements as follows:


Much of the difficulty consists in fine-tuning the strength of "nice", "nicer" and "special" above. We present below our main contributions on finite two-player win/lose concurrent stochastic games with prefix-independent objectives:

	- (a) Theorem 2: If every game that has a subgame *almost-surely winning* strategy also has a positional one, then every game that has a subgame *optimal* strategy also has a positional one.
	- (b) Corollary 1: every B¨uchi or co-B¨uchi game that has a subgame optimal strategy has a positional one. (Whereas parity games may require infinite memory [12].)

Note that the transfer result **2a** can be generalized from positional to finite memory.

3. We say that a strategy has finite-choice, if it uses only finitely many action distributions. Note that finite-memory (resp. deterministic) strategies clearly have finite choice.


Note that **3a** and **3b** are false if the word finite-choice is removed [4]. The proof of **3b** invokes **3a**. Flavor (and proofs) of **3b** and **2a** are similar, but both premises and conclusions are weakened in **3b**, as emphasized.

**Related works.** A large part of this paper is dedicated to the extension to concurrent games of the results from [14] regarding the transfer of memory from almost-surely winning strategies to optimal strategies in turn-based games. Note that the proof technique used in [14] is different and could not be adapted to our more general setting. In their proof, both players agree on a preference over Nature states and play according to this preference. In our proof, we slice the graph into value areas (that is, sets of states with the same value), and show that it is sufficient to play an almost-sure winning strategy in each slice; we then glue these (partial) strategies together to get a subgame-optimal strategy over the whole graph.

The slicing technique was already used in the context of concurrent games in [8]. The authors focus on parity objectives and establishes a memory transfer result from limit-sure winning strategies to almost-optimal strategies. As an application, they show that, for co-B¨uchi objectives, since positional strategies are sufficient to win limit-surely, they also are to win almost-optimally. Their construction made heavy use of the specific nature of parity objectives.

We also mention [6], where the focus is also on concurrent games with prefixindependent objectives. In particular, the authors establish a (very useful) result: if all states have positive values, then they all have value 1. (Note that a strengthening of this result is presented in this paper (Theorem 3), which also appears as an adaptation of a result proved in [14]). This result is then used in another context with non-zero-sum games.

Finally, some recent works on concurrent games have been done in [2,3,4], where the goal is the following: local interactions of the two players in the player state are given by bi-dimensional tables; those tables can be abstracted as game forms, where (output) variables are issues of the local interaction (possibly several issues are labelled by the same variable). The goal of this series of works is to give (intrinsic) properties of these game forms, so that, when used in a graph game, the existence of optimal strategies is ensured. For instance, in [3], a property of games forms, called RM, is given, which ensures that, if one only uses RM game forms in a graph, then for every reachability objective, Player A will always have an optimal strategy for that objective. This property is a characterization of well-behaved game forms regarding reachability objectives

since every game form which is not RM can be embedded into a (small) graph game in such a way that Player A does not have an optimal strategy. This line of works really differs from the target of the current paper.

**Structure of the paper.** Section 2 presents notations, Section 3 recalls the notion of game forms, Section 4 introduces our formalism, Section 5 exhibits a necessary and sufficient pair of conditions for subgame optimality, Section 6 shows a memory transfer from subgame almost-surely winning to subgame optimal in concurrent games, and Section 7 adapts the results of the previous section to the case of the existence of a subgame finite-choice strategy.

Detailed proofs and additional formal definitions are available in [5].

## **2 Preliminaries**

Consider a non-empty set Q. We denote by Q<sup>∗</sup>, Q<sup>+</sup> and Q<sup>ω</sup> the set of finite sequences, non-empty finite sequences and infinite sequences of elements of Q respectively. For n <sup>∈</sup> <sup>N</sup>, we denote by Q<sup>n</sup> (resp. <sup>Q</sup><sup>≤</sup><sup>n</sup>) the set of sequences of (resp. at most) <sup>n</sup> elements of <sup>Q</sup>. For all <sup>ρ</sup> <sup>=</sup> <sup>q</sup><sup>1</sup> ··· <sup>q</sup><sup>n</sup> <sup>∈</sup> <sup>Q</sup><sup>n</sup> and <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we denote by <sup>ρ</sup><sup>i</sup> the element <sup>q</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup> and by <sup>ρ</sup><sup>≤</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup><sup>i</sup> the finite sequence <sup>q</sup><sup>1</sup> ··· <sup>q</sup><sup>i</sup>. For a subset S <sup>⊆</sup> Q, we denote by Q<sup>∗</sup> ·S<sup>ω</sup> <sup>⊆</sup> <sup>Q</sup><sup>ω</sup> the set of infinite paths that eventually settle in S and by (Q<sup>∗</sup> ·S)<sup>ω</sup> <sup>⊆</sup> <sup>Q</sup><sup>ω</sup> the set of infinite paths visiting infinitely often the set S.

<sup>A</sup> *discrete probabilistic distribution* over a non-empty finite set Q is a function μ : Q <sup>→</sup> [0, 1] such that - <sup>x</sup>∈<sup>Q</sup> <sup>μ</sup>(x) = 1. The *support* Supp(μ) of a probabilistic distribution μ : Q <sup>→</sup> [0, 1] is the set of non-zeros of the distribution: Supp(μ) = {q <sup>∈</sup> Q <sup>|</sup> μ(q) <sup>∈</sup> (0, 1]}. The set of all distributions over Q is denoted <sup>D</sup>(Q).

## **3 Game forms**

We recall the definition of game forms – informally, bi-dimensional tables with variables – and of games in normal forms – game forms whose outcomes are values between 0 and 1.

**Definition 1 (Game form and game in normal form).** *A* game form *(GF for short) is a tuple* <sup>F</sup> <sup>=</sup> Act<sup>A</sup>, Act<sup>B</sup>, <sup>O</sup>, *where* Act<sup>A</sup> *(resp.* ActB*) is the nonempty finite set of actions available to Player* A *(resp.* B*),* O *is a non-empty set of outcomes, and* : Act<sup>A</sup> <sup>×</sup> Act<sup>B</sup> <sup>→</sup> <sup>O</sup> *is a function that associates an outcome to each pair of actions. When the set of outcomes* <sup>O</sup> *is equal to* [0, 1]*, we say that* <sup>F</sup> *is a* game in normal form*. For a valuation* v <sup>∈</sup> [0, 1]<sup>O</sup> *of the outcomes, the notation* F, v *refers to the game in normal form* Act<sup>A</sup>, Act<sup>B</sup>, [0, 1], v ◦ *.*

We use game forms to represent interactions between two players. The strategies available to Player A (resp. B) are convex combinations of actions given as the rows (resp. columns) of the table. In a game in normal form, Player A tries to maximize the outcome, whereas Player B tries to minimize it.

**Definition 2 (Outcome of a game in normal form).** *Consider a game in normal form* F = -ActA, ActB, [0, 1], -*. The set* D(ActA) *(resp.* D(ActB)*) is the set of strategies available to Player* A *(resp.* B*). For a pair of strategies* (σA, σB) ∈ D(ActA) × D(ActB)*, the outcome* out<sup>F</sup> (σA, σB) *in* F *of the strategies* (σA, σB) *is defined as:* out<sup>F</sup> (σA, σB) := - a∈ActA - <sup>b</sup>∈ActB <sup>σ</sup>A(a) · <sup>σ</sup>B(b) · -(a, b) ∈ [0, 1]*.*

**Definition 3 (Value of a game in normal form and optimal strategies).** *Consider a game in normal form* F = -ActA, ActB, [0, 1], *and a strategy* σ<sup>A</sup> ∈ D(ActA) *for Player* A*. The* value *of the strategy* σA*, denoted* val<sup>F</sup> (σA) *is equal to:* val<sup>F</sup> (σA) := inf<sup>σ</sup>B∈D(ActB) out<sup>F</sup> (σA, σB)*, and analogously for Player* B*, with a* sup *instead of an* inf*. When* sup<sup>σ</sup>A∈D(ActA) val<sup>F</sup> (σA) = inf<sup>σ</sup>B∈D(ActB) val<sup>F</sup> (σB)*, it defines the* value *of the game* F*, denoted* val<sup>F</sup> *.*

*A strategy* σ<sup>A</sup> ∈ D(ActA) *ensuring* val<sup>F</sup> = val<sup>F</sup> (σA) *is called* optimal*. The set of all optimal strategies for Player* A *is denoted* OptA(F) ⊆ D(ActA)*, and analogously for Player* B*. Von Neuman's minimax theorem [20] ensures the existence of optimal strategies (for both players).*

In the following, strategies in games in normal forms will be called GF-strategies, in order not to confuse them with strategies in concurrent (graph) games.

## **4 Concurrent games and optimal strategies**

## **4.1 Concurrent arenas and strategies**

We introduce the definition of concurrent arenas played on a finite graph.

**Definition 4 (Finite stochastic concurrent arena).** *A* colored concurrent arena C *is a tuple* -Q,(Aq)<sup>q</sup>∈<sup>Q</sup>,(Bq)<sup>q</sup>∈<sup>Q</sup>, D, δ, dist,K, col *where* Q *is the nonempty finite set of states, for all* q ∈ Q*,* A<sup>q</sup> *(resp.* Bq*) is the non-empty finite set of actions available to Player* A *(resp.* B*) at state* q*,* D *is the finite set of Nature states,* δ : <sup>q</sup>∈<sup>Q</sup>({q} × <sup>A</sup><sup>q</sup> <sup>×</sup> <sup>B</sup>q) <sup>→</sup> <sup>D</sup> *is the transition function,* dist : <sup>D</sup> → D(Q) *is the distribution function. Furthermore,* K *is the non-empty finite set of colors and* col : Q → K *is the coloring function.*

In the following, the arena C will refer to the tuple -Q,(Aq)<sup>q</sup>∈<sup>Q</sup>,(Bq)<sup>q</sup>∈<sup>Q</sup>, D, δ, dist,K, col, unless otherwise stated. A concurrent game is obtained from a concurrent arena by adding a winning condition: the set of infinite paths winning for Player A (and losing for Player B).

**Definition 5 (Finite stochastic concurrent game).** *A finite* concurrent game *is a pair* -<sup>C</sup>, W *where* <sup>C</sup> *is a finite concurrent colored arena and* <sup>W</sup> <sup>⊆</sup> <sup>K</sup><sup>ω</sup> *is Borel. The set* W *is called the* objective*, as it corresponds to the set of colored paths winning for Player* A*.*

In this paper, we only consider a specific kind of objectives: prefix-independent ones. Informally, they correspond to objectives W such that an infinite path ρ is in W if and only if any of its suffixes is in W. More formally:

**Definition 6 (Prefix-independent objectives).** *For a non-empty finite set of colors* <sup>K</sup> *and* W <sup>⊆</sup> <sup>K</sup><sup>ω</sup>*,* <sup>W</sup> *is said to be* prefix-independent *(PI for short) if, for all* <sup>ρ</sup> <sup>∈</sup> <sup>K</sup><sup>ω</sup> *and* <sup>i</sup> <sup>≥</sup> <sup>0</sup>*,* <sup>ρ</sup> <sup>∈</sup> <sup>W</sup> <sup>⇔</sup> <sup>ρ</sup><sup>≥</sup><sup>i</sup> <sup>∈</sup> <sup>W</sup>*.*

In the following, we refer to concurrent games with prefix-independent objectives as PI concurrent games.

**Definition 7 (Parity, B¨uchi, co-B¨uchi objectives).** *Let* <sup>K</sup> <sup>⊂</sup> <sup>N</sup> *be a finite non-empty set of integers. Consider a concurrent arena* C *with* K *as set of colors. For an infinite path* <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>ω</sup>*, we denote by* col(ρ)<sup>∞</sup> <sup>⊆</sup> <sup>N</sup> *the set of colors seen infinitely often in* <sup>ρ</sup>*:* col(ρ)<sup>∞</sup> := {<sup>n</sup> <sup>∈</sup> <sup>N</sup> | ∀<sup>i</sup> <sup>∈</sup> <sup>N</sup>, <sup>∃</sup><sup>j</sup> <sup>≥</sup> i, col(ρ<sup>j</sup> ) = n}*. Then, the* parity objective *w.r.t.* col *is the set* WParity(col) := {<sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>ω</sup> <sup>|</sup> max col(ρ)<sup>∞</sup> *is even* }*. The B¨uchi (resp. co-B¨uchi) objective correspond to the parity objective with* <sup>K</sup> := {1, <sup>2</sup>} *(resp.* <sup>K</sup> := {0, <sup>1</sup>}*).*

Strategies are then defined as functions that, given the history of the game (i.e. the sequence of states already seen) associate a distribution on the actions available to the Player.

**Definition 8 (Strategies).** *Consider a concurrent game* <sup>C</sup>*. A strategy for Player* <sup>A</sup> *is a function* <sup>s</sup><sup>A</sup> : <sup>Q</sup><sup>+</sup> → D(A) *with* <sup>A</sup> := - <sup>q</sup>∈<sup>Q</sup> <sup>A</sup><sup>q</sup> *such that, for all* <sup>ρ</sup> <sup>=</sup> <sup>q</sup><sup>0</sup> ··· <sup>q</sup><sup>n</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>*, we have* <sup>s</sup>A(ρ) ∈ D(A<sup>q</sup><sup>n</sup> )*. We denote by* <sup>S</sup><sup>A</sup> <sup>C</sup> *the set of all strategies in arena* C *for Player* A*. This is analogous for Player* B*.*

Given two strategies <sup>s</sup><sup>A</sup>,s<sup>B</sup> for both players in an arena <sup>C</sup> from a starting state q<sup>0</sup>, we define in the usual manner the probability <sup>P</sup>C,q<sup>0</sup> sA,sB of a finite path which induces the probability of an arbitrary Borel subset of infinite paths. Values of strategies and of the game are defined below.

**Definition 9 (Value of strategies and of the game).** *Let* <sup>G</sup> <sup>=</sup> <sup>C</sup>, W *be a PI concurrent game and consider a strategy* <sup>s</sup><sup>A</sup> <sup>∈</sup> <sup>S</sup><sup>A</sup> <sup>C</sup> *for Player* <sup>A</sup>*. The function* <sup>χ</sup><sup>G</sup>[sA] : <sup>Q</sup> <sup>→</sup> [0, 1] *giving the value of the strategy* <sup>s</sup><sup>A</sup> *is such that, for all* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup>*, we have* <sup>χ</sup><sup>G</sup>[sA](q<sup>0</sup>) := inf sB∈S<sup>B</sup> <sup>C</sup> <sup>P</sup>C,q<sup>0</sup> sA,sB [W]*. The function* <sup>χ</sup><sup>G</sup>[A] : <sup>Q</sup> <sup>→</sup> [0, 1] *giving the value for Player* <sup>A</sup>*: is such that, for all* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup>*, we have* <sup>χ</sup><sup>G</sup>[A](q<sup>0</sup>) := supsA∈S<sup>A</sup> <sup>C</sup> <sup>χ</sup><sup>G</sup>[sA](q<sup>0</sup>)*. The function* <sup>χ</sup><sup>G</sup>[B] : <sup>Q</sup> <sup>→</sup> [0, 1] *giving the value of the game for Player* B *is defined similarly by reversing the supremum and infimum.*

*By Martin's result on the determinacy of Blackwell games [17], for all concurrent games* G = <sup>C</sup>, W*, the value functions for both Players are equal, this defines the value function* <sup>χ</sup><sup>G</sup> : <sup>Q</sup> <sup>→</sup> [0, 1] *of the game:* <sup>χ</sup><sup>G</sup> := <sup>χ</sup><sup>G</sup>[A] = <sup>χ</sup><sup>G</sup>[B]*.*

We define value areas: subsets of states whose values are the same.

**Definition 10 (Value area).** *In a PI concurrent game* <sup>G</sup>*,* <sup>V</sup><sup>G</sup> *refers to the set of values appearing in the game:* <sup>V</sup><sup>G</sup> := {χ<sup>G</sup>[q] <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>}*. Furthermore, for all* <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup>*,* <sup>Q</sup><sup>u</sup> <sup>⊆</sup> <sup>Q</sup> *refers to the set of states whose values are* <sup>u</sup> *w.r.t.* <sup>χ</sup><sup>G</sup>*:* <sup>Q</sup><sup>u</sup> := {<sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>|</sup> <sup>χ</sup><sup>G</sup>(q) = <sup>u</sup>}*.*

In concurrent games, game forms appear at each state and describe the interactions of the players at that state. Furthermore, the valuation mapping each state to its value in the game can be lifted, via a convex combination, into a valuation of the Nature states. This, in turn, induces a natural way to define the game in normal form appearing at each state.

**Definition 11 (Local interactions, Lifting valuations).** *In a PI concurrent game* <sup>G</sup> *where the valuation* <sup>χ</sup><sup>G</sup> : <sup>Q</sup> <sup>→</sup> [0, 1] *gives the values of the game, the lift* <sup>ν</sup><sup>G</sup> : <sup>D</sup> <sup>→</sup> [0, 1] *is such that, for all* <sup>d</sup> <sup>∈</sup> <sup>D</sup>*, we have* <sup>ν</sup>G(d) := - <sup>q</sup>∈<sup>Q</sup> <sup>χ</sup>G(q) · dist(d)(q) *(recall that* dist : <sup>D</sup> → D(Q) *is the distribution function).*

*Let* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*. The* local interaction *at state* <sup>q</sup> *is the game form* <sup>F</sup><sup>q</sup> <sup>=</sup> Aq, Bq, <sup>D</sup>, δ(q, ·, ·)*. The game in normal form at state* <sup>q</sup> *is then* <sup>F</sup>nf <sup>q</sup> := Fq, νG*.*

The values of the game in normal form <sup>F</sup>nf <sup>q</sup> and of the state q are equal.

**Proposition 1.** *In a PI concurrent game* <sup>G</sup>*, for all states* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*, we have* χG(q) = outFnf q *.*

#### **4.2 More on strategies**

In this subsection, we define several kinds of strategies. Let us fix a PI concurrent game G for the rest of this section. First, we consider optimal strategies, i.e. strategies realizing the value of the game. Strategies are positively-optimal if their values are positive from all states whose value is positive.

**Definition 12 ((Positively-) optimal strategies).** *A Player* <sup>A</sup> *strategy* <sup>s</sup><sup>A</sup> <sup>∈</sup> SA <sup>C</sup> *is (resp. positively-)* optimal *from a state* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *if* <sup>χ</sup>G(q) = <sup>χ</sup>G[sA](q) *(resp. if* <sup>χ</sup>G(q) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> <sup>χ</sup>G[sA](q) <sup>&</sup>gt; <sup>0</sup>*). It is (resp. positively-) optimal if this holds from all states* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*.*

Note that the definition of optimal strategies we consider is sometimes referred to as uniform optimality, as it holds from every state of the game. However, it does not say anything about what happens once some sequence of states have been seen. We would like now to define a notion of strategy that is optimal from any point that can occur after any finite sequence of states has been seen. This correspond to subgame optimal strategies. To define them, we need to introduce the notion of residual strategy.

**Definition 13 (Residual and Subgame Optimal Strategies).** *For all finite sequences* <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>*, the* residual strategy <sup>s</sup> ρ <sup>A</sup> *of a Player* <sup>A</sup> *strategy* <sup>s</sup><sup>A</sup> *is the strategy* s ρ <sup>A</sup> : <sup>Q</sup><sup>+</sup> → D(A) *such that, for all* <sup>π</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>*, we have* <sup>s</sup> ρ <sup>A</sup>(π) := <sup>s</sup>A(<sup>ρ</sup> · <sup>π</sup>)*.*

*The Player* <sup>A</sup> *strategy* <sup>s</sup><sup>A</sup> *is* subgame optimal *if, for all* <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup> · <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>*, the residual strategy* s ρ <sup>A</sup> *is optimal from* <sup>q</sup>*, i.e.* <sup>χ</sup>G[<sup>s</sup> ρ <sup>A</sup>](q) = <sup>χ</sup>G(q)*.*

Note that, in particular, subgame optimal strategies are optimal strategies. When such strategies do exist, we want them to be as simple as possible, for instance we want them to be positional, that is that they only depend on the current state of the game.

As for Player B, we will consider a specific kind of strategies, namely deterministic strategies. That is because, once a Player A strategy is fixed we obtain an (infinite) MDP. In such a context, ε-optimal strategies can be chosen among deterministic strategies (see for instance the explanation in [9, Thm. 1]).

**Definition 14 (Positional, Deterministic strategies).** *A Player* A *strategy* <sup>s</sup><sup>A</sup> *is* positional *if, for all states* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *and paths* <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *we have* <sup>s</sup>A(ρ·q) = <sup>s</sup>A(q)*. A Player* <sup>B</sup> *strategy* <sup>s</sup><sup>B</sup> *is* deterministic *if, for all finite sequences* <sup>ρ</sup> · <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>*, there exists* <sup>b</sup> <sup>∈</sup> <sup>B</sup><sup>q</sup> *such that* <sup>s</sup>B(<sup>ρ</sup> · <sup>q</sup>)(b)=1*.*

## **5 Necessary and sufficient condition for subgame optimality**

In this section, we present a necessary and sufficient pair of conditions for a Player A strategy to be subgame optimal, formally stated in Theorem 1. The arguments given here are somewhat similar to the ones given in Section 4 of [4], which deals with the same question restricted to positional strategies.

The first condition is local: it specifies how a strategy behaves in the games in normal form at each local interaction of the game. As mentioned in Proposition 1, at each state q, the value of the game in normal form <sup>F</sup>nf <sup>q</sup> is equal to the value of the state <sup>q</sup> (given by the valuation <sup>χ</sup><sup>G</sup> <sup>∈</sup> [0, 1]<sup>Q</sup>). This suggests that, for all finite sequences of states <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> ending at that state q, the GF-strategy <sup>s</sup>A(ρ) needs to be optimal in the game in normal form <sup>F</sup>nf <sup>q</sup> for the residual strategy s ρ A to be optimal from q. Strategies with such a property are called locally optimal. This is a necessary condition for subgame optimality. (However, it is neither a necessary nor a sufficient condition for optimality, as argued in Section 6).

**Definition 15 (Locally optimal strategies).** *Consider a PI concurrent game* <sup>G</sup>*. A Player* <sup>A</sup> *strategy* <sup>s</sup><sup>A</sup> *is locally optimal if, for all* <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup>- · q <sup>∈</sup> Q<sup>+</sup>*, the* GF*strategy* <sup>s</sup>A(ρ) *is optimal in the game in normal form* <sup>F</sup>nf <sup>q</sup> *. That is – recalling that* <sup>ν</sup><sup>G</sup> <sup>∈</sup> [0, 1]<sup>D</sup> *lifts the valuation* <sup>χ</sup><sup>G</sup> <sup>∈</sup> [0, 1]<sup>Q</sup> *to the Nature states – for all* b <sup>∈</sup> B<sup>q</sup>*:* <sup>χ</sup><sup>G</sup>(q) <sup>≤</sup> - <sup>a</sup>∈<sup>A</sup> <sup>s</sup>A(ρ)(a) · <sup>ν</sup><sup>G</sup> ◦ <sup>δ</sup>(q, a, b) = outFnf <sup>q</sup> (sA(ρ), b)

**Lemma 1.** *In a PI concurrent game, subgame optimal strategies are locally optimal.*

Note that this was already shown for positional strategies in [4].

Local optimality does not ensure subgame optimality in general. However, it does ensure that, for all Player B deterministic strategies, the game almost-surely eventually settles in a value area, i.e. in some <sup>Q</sup><sup>u</sup> for some <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup>.

**Lemma 2.** *Consider a PI concurrent game* G *and a Player* A *locally optimal strategy* sA*. For all Player* B *deterministic strategies, almost surely the states seen infinitely often have the same value. That is:* PsA,sB [ <sup>u</sup>∈V<sup>G</sup> <sup>Q</sup><sup>∗</sup> · (Q<sup>u</sup>)<sup>ω</sup>]=1*.*

*Proof (Sketch).* First, if a state of value 1 is reached (i.e. a state in Q<sup>1</sup>), then all states that can be seen with positive probability have value 1 (i.e. are in <sup>Q</sup><sup>1</sup>), since the strategy <sup>s</sup><sup>A</sup> is locally optimal. Let now <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> be the highest value in <sup>V</sup><sup>G</sup> that is not 1 and consider the set of infinite paths such that the set <sup>Q</sup><sup>u</sup> is seen infinitely often but the game does not settle in it, i.e. the set (Q<sup>∗</sup> ·(Q\Q<sup>u</sup>))<sup>ω</sup> <sup>∩</sup>(Q<sup>∗</sup> ·Q<sup>u</sup>)<sup>ω</sup> <sup>⊆</sup> <sup>Q</sup><sup>ω</sup>. Since the strategy <sup>s</sup><sup>A</sup> is locally optimal (and since V<sup>G</sup> is finite), one can show that there is a positive probability p > 0 such that, the conditional probability of reaching <sup>Q</sup><sup>1</sup> knowing that <sup>Q</sup>u is left is at least <sup>p</sup>. Hence, if <sup>Q</sup>u is left infinitely often, almost-surely the set <sup>Q</sup><sup>1</sup> is seen (and never left). It follows that the probability of the event (Q<sup>∗</sup> · (<sup>Q</sup> \ <sup>Q</sup>u))<sup>ω</sup> <sup>∩</sup> (Q<sup>∗</sup> · <sup>Q</sup>u)<sup>ω</sup> is 0. This implies that, almost-surely, if the set <sup>Q</sup>u is seen infinitely often, then at some point it is never left. The same arguments can then be used with the highest value in V<sup>G</sup> that is less than u, etc. Overall, we obtain that, for all u ∈ VG, if a set <sup>Q</sup>u is seen infinitely often, it is eventually never left almost-surely.

Local optimality ensures that, at each step, the expected values of the states reached does not worsen (and may even improve if Player B does not play optimally). By propagating this property, we obtain that, given a Player A locally optimal strategy and a Player B deterministic strategy, the convex combination of the values u in V<sup>G</sup> weighted by the probability of settling in the value area <sup>Q</sup>u, from a state <sup>q</sup> is at least equal to its value <sup>χ</sup>G(q). This is stated in Lemma <sup>3</sup> below.

**Lemma 3.** *For a PI concurrent game* <sup>G</sup>*, a Player* <sup>A</sup> *locally optimal strategy* <sup>s</sup><sup>A</sup>*, a Player* <sup>B</sup> *deterministic strategy* <sup>s</sup><sup>B</sup> *and a state* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*:* <sup>χ</sup>G(q) <sup>≤</sup> - <sup>u</sup>∈V<sup>G</sup> <sup>u</sup> · PsA,sB q [Q<sup>∗</sup> · (Qu)ω]*.*

Note that if Player B plays subgame optimally, then this inequality is an equality.

*Proof (Sketch).* First, let us denote <sup>P</sup>sA,sB q by <sup>P</sup>. It can be shown by induction that, for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>∗, we have the property <sup>P</sup>(i) : <sup>χ</sup>G(q) <sup>≤</sup> - π·q-<sup>∈</sup>q·Q<sup>i</sup> <sup>χ</sup>G(q )· <sup>P</sup>(<sup>π</sup> · <sup>q</sup> ) = - <sup>u</sup>∈V<sup>G</sup> \{0} <sup>u</sup> ·P[<sup>q</sup> ·Qi−<sup>1</sup> ·Qu]. Furthermore, since by Lemma 2, the game almostsurely settles in a value area, it can be shown that for n large enough, the probability of being in <sup>Q</sup>u after <sup>n</sup> steps (i.e. <sup>P</sup>[<sup>q</sup> · <sup>Q</sup>n−<sup>1</sup> · <sup>Q</sup>u]) is arbitrarily close to the probability of eventually settling in <sup>Q</sup>u (i.e. <sup>P</sup>[Q<sup>∗</sup> · (Qu)ω]). We can then apply P(n) to obtain the desired inequality.

Recall that we are considering a pair of conditions to characterize that a strategy is subgame optimal. The first condition is local optimality. To summarize, we have seen that the fact that a strategy is locally optimal ensures that, from any state q, the expected values of the value areas where the game settles is at least χG(q). However, local optimality does not ensure anything as to the probability of W given that the game settles in a specific value area. This is where the second condition comes into play. For the explanations regarding this condition, we will need Lemma 4 below: a consequence of Levy's 0-1 Law.

**Lemma 4.** *Let* <sup>M</sup> *be a countable Markov chain with a PI objective. If there is <sup>a</sup>* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *such that* <sup>χ</sup>M(q) <sup>&</sup>lt; <sup>1</sup>*, then* infq-<sup>∈</sup>Q <sup>χ</sup>M(q )=0*.*

Consider now a Player A subgame optimal strategy s<sup>A</sup> and a Player B deterministic strategy. Let us consider what happens if the game eventually settles in <sup>Q</sup>u for some u ∈ V<sup>G</sup> \ {0}. Assume towards a contradiction that there is a finite path after which the probability of <sup>W</sup> given that the play eventually settles in <sup>Q</sup>u is less than 1. Then, there is a continuation of this path ending in <sup>Q</sup>u for which this probability of W is less than u. Indeed, it was shown that, for a PI objective, in a countable Markov chain (which is what we obtain once strategies for both players are fixed), if there is a state with a value less than 1, then the infimum of the values in the Markov chain is 0 (this is what is stated in Lemma 4). Following our above towards-a-contradiction-assumption, there would be a finite path from which the Player A strategy s<sup>A</sup> is not optimal. This is in contradiction with the fact that it is subgame optimal. Hence, a second necessary condition – in addition to the local optimality assumption – for subgame optimality is: from all finite paths, for all Player B deterministic strategies, for all positive values <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \ {0}, the probability of <sup>W</sup> and eventually settling in <sup>Q</sup><sup>u</sup> is equal to the probability of eventually settling in Qu. We obtain the theorem below.

**Theorem 1.** *Consider a concurrent game* <sup>G</sup> *with a PI objective* <sup>W</sup> *and a Player* <sup>A</sup> *strategy* <sup>s</sup><sup>A</sup> <sup>∈</sup> <sup>S</sup><sup>A</sup> <sup>C</sup> *. The strategy* <sup>s</sup><sup>A</sup> *is subgame optimal if and only if:*


*Proof (Sketch).* Lemma 1 states that local optimality is necessary and we have informally argued above why the second condition is also necessary for subgame optimality. As for the fact that they are sufficient conditions, this is a direct consequence of Lemmas 2 and 3 and the fact that deterministic strategies can achieve the same values as arbitrary strategies in MDPs (which we obtain once a Player A strategy is fixed), as cited in Subsection 4.2.

One may ask what happens in the special case where the strategy s<sup>A</sup> considered is positional. As mentioned above, such a characterization was already presented in [4] <sup>1</sup>. Overall, we obtain a similar result except that the second condition is replaced by what happens in the game restricted to the End Components in the Markov Decision Process induced by the positional strategy sA.

## **6 From subgame almost-surely winning to subgame optimality**

In [14, Thm. 4.5], the authors have proved a transfer result in PI turn-based games: the amount of memory sufficient to play optimally in every state of value 1 of every game is also sufficient to play optimally in every game. This result does not hold on concurrent games as is. First, although there are always optimal strategies in PI turn-based games (as proved in the same paper [14, Thm. 4.3]), there are PI concurrent games without optimal strategies. Second, infinite memory may be required to play optimally in co-B¨uchi concurrent games whereas almost-surely winning strategies can be found among positional strategies in a turn-based setting. This can be seen in the game of Figure 1 with col(q0) = 0 and col(q1) = col(q <sup>1</sup>) = 1. The green values in the local interaction at state q<sup>0</sup> are the

<sup>1</sup> The proof was only presented for a specific class of objectives.

values of the game if they are reached (the game ends immediately). If a green value is not reached, the objective of Player A is to see only finitely often states q<sup>1</sup> and q- <sup>1</sup>. It has already been argued in [4] that the value of this game is 1/2 and that there is an optimal strategy for Player A but it requires infinite memory. To play optimally, Player A must play the top row with probability 1 − ε<sup>k</sup> and the middle row with probability ε<sup>k</sup> for ε<sup>k</sup> > 0 that goes (fast) to 0 when k goes to ∞ (where k denotes the number of steps). The ε<sup>k</sup> must be chosen so that, if Player B always plays the left column with probability 1, then the state q<sup>1</sup> is seen finitely often with probability 1. Furthermore, as soon as the state q- <sup>1</sup> is visited, Player A switches to a positional strategy playing the bottom row with probability ε- <sup>k</sup> small enough (where k denotes the number of steps before the state q- <sup>1</sup> was seen) and the two top rows with probability (1 − ε- <sup>k</sup>)/2.

Hence, the transfer of memory from almost-surely winning to optimal does not hold in concurrent games even if it is assumed that optimal strategies exist. However, one can note that although the strategy described above is optimal, it is not subgame optimal. Indeed, when the strategy switches, the value of the residual strategy is 1/2−ε- <sup>k</sup> < 1/2. In fact, there is no subgame optimal strategy in that game. Actually, if we assume that, not only optimal but subgame optimal strategies exist, then the transfer of memory will hold.

The aim of this section is twofold: first, we identify a necessary and sufficient condition for the existence of subgame optimal strategies<sup>2</sup>. Second, we establish the above-mentioned memory transfer that relates the amount of memory to play subgame optimally and to be almost-surely winning. Before stating the main theorem of this section, let us first introduce the definition of positionally subgame almost-surely winnable objective, i.e. objectives for which subgame almost-surely winning strategies can be found among positional strategies.

**Definition 16 (Positionally subgame almost-surely winnable objective).** *Consider a PI objective* <sup>W</sup> <sup>⊆</sup> <sup>K</sup><sup>ω</sup>*. It is said to be a positionally subgame almostsurely winnable objective (*PSAW *for short) if the following holds: in all concur-*

<sup>2</sup> Note that this is different from what we did in the previous section: there, we established a necessary and sufficient condition for a specific strategy to be subgame optimal. Here, given a game, we consider necessary and sufficient conditions on the game for the existence of a subgame optimal strategy.

*rent games* G = -C, W *where there is a subgame almost-surely winning strategy, there is a positional one.*

**Theorem 2.** *Consider a non-empty finite set of colors* K *and a PI objective* <sup>∅</sup> - <sup>W</sup> <sup>⊆</sup> <sup>K</sup><sup>ω</sup>*. Consider a concurrent game* <sup>G</sup> *with objective* <sup>W</sup>*. Then, the three following assertions are equivalent:*


*Furthermore, if this holds and if the objective* W *is* PSAW*, then there exists a subgame optimal positional strategy.*

First, note that the equivalence is stated in terms of existence of strategies, not on the strategies themselves. In particular, any subgame optimal strategy is both optimal and locally optimal, however, an optimal strategy that is locally optimal is not necessarily a subgame optimal strategy. Second, it is straightforward that point *a* implies point *b* (from Theorem 1) and that point *b* implies point *c* (by definition of positively-optimal strategies). In the remainder of this section, we explain informally the constructions leading to the proof of this theorem, i.e. to the proof that point *c* implies point *a*. The transfer of memory is a direct consequence of the way this theorem is proven. We fix a PI concurrent game G = -C, W for the rest of the section.

The idea is as follows. As stated in Theorem 1, subgame optimal strategies are locally optimal and win the game almost-surely if the game settles in a value area Q<sup>u</sup> for some positive u ∈ V<sup>G</sup> \{0}. Our idea is therefore to consider subgame almost-surely winning strategies in the derived game Gu: a "restriction" of the game G to Q<sup>u</sup> (more details will be given later). We can then glue together these subgame almost-surely winning strategies – defined for all u ∈ V<sup>G</sup> \ {0} – into a subgame optimal strategy. However, there are some issues:


Note that the method we use here is different from what the authors of [14] did to prove the transfer of memory in turn-based games.

Let us first deal with issue **3**. One can ensure that the almost-surely winning strategies in the game G<sup>u</sup> are all locally optimal in G by properly defining the game Gu. More specifically, this is done by enforcing that the only Player A possible strategies in G<sup>u</sup> are locally optimal in the game G. To do so, we construct the game G<sup>u</sup> whose state space is Q<sup>u</sup> (plus gadget states) but whose set of actions AFnf <sup>q</sup> , at a state q ∈ Qu, is such that the set of strategies D(AFnf <sup>q</sup> ) corresponds exactly to the set of optimal strategies in the original game in normal form <sup>F</sup>nf q , while keeping the set of actions AFnf <sup>q</sup> for Player A finite. This is possible thanks Subgame Optimal Strategies in Concurrent Games 553


**Fig. 4.** The local interaction F<sup>q</sup><sup>0</sup> at state q0. **Fig. 5.** The game in normal form <sup>F</sup>nf <sup>q</sup><sup>0</sup> at the state q0. **Fig. 6.** The game <sup>F</sup>opt,nf <sup>q</sup><sup>0</sup> with only optimal strategies. **Fig. 7.** The game form <sup>F</sup>opt <sup>q</sup><sup>0</sup> with only optimal strategies.

to Proposition <sup>2</sup> below: in every game in normal form <sup>F</sup>nf <sup>q</sup> at state q ∈ Qu, there exists a finite set AFnf <sup>q</sup> of optimal strategies such that the optimal strategies in Fnf <sup>q</sup> are exactly the convex combinations of strategies in AFnf <sup>q</sup> . This is a well known result, argued for instance in [18].

**Proposition 2.** *Consider a game in normal form* <sup>F</sup>nf <sup>=</sup> A, B, [0, 1], δ *with* <sup>|</sup>A<sup>|</sup> <sup>=</sup> <sup>n</sup> *and* <sup>|</sup>B<sup>|</sup> <sup>=</sup> <sup>k</sup>*. There exists a set* <sup>A</sup>Fnf <sup>⊆</sup> OptA(Fnf) *of optimal strategies such that* <sup>|</sup>AFnf | ≤ <sup>n</sup> <sup>+</sup> <sup>k</sup> *and* <sup>D</sup>(AFnf) = OptA(Fnf)*.*

*Proof (Sketch).* One can write a system of n + k inequalities (with some additional equalities) whose set of solutions is exactly the set of optimal GF-strategies OptA(Fnf). The result then follows from standard system of inequalities arguments as the space of solutions is in fact a polytope with at most n + k vertices.

.

We illustrate this construction: a part of a concurrent game is depicted in Figure 3 and the change of the interaction of the players at state q<sup>0</sup> is depicted in Figures 4, 5, 6 and 7.

The game G<sup>u</sup> has the same objective W as the game G. Since we want all the states to have value 1 in G<sup>u</sup> (recall issue **1**), we will build the game G<sup>u</sup> such that any edge leading to a state not in Q<sup>u</sup> in G now leads to a PI concurrent game G<sup>W</sup> (with the same objective W) where all states have value 1. The game G<sup>W</sup> is (for instance) a clique with all colors in K where Player A plays alone.

An illustration of this construction can be found in Figures 8 and 9. The blue dotted arrows are the ones that need to be redirected when the game is changed. With such a definition, we have made some progress w.r.t. the issue **1** cited previously (regarding the values being equal to 1): the values of all states of the game G<sup>u</sup> are positive (for positive u).

**Lemma 5.** *Consider the game* G<sup>u</sup> *for some positive* u ∈ V<sup>G</sup> \ {0} *and assume that, in* G*, there exists a positively-optimal strategy that is locally optimal. Then, for all states* q *in* Gu*, the value of the state* q *in* G<sup>u</sup> *is positive:* χ<sup>G</sup><sup>u</sup> (q) > 0*.*

*Proof (Sketch).* Consider a state q ∈ Q<sup>u</sup> and a Player A locally optimal strategy s<sup>A</sup> in G that is positively-optimal from q. Then, the strategy s<sup>A</sup> (restricted to Q<sup>+</sup> <sup>u</sup> ) can be seen as a strategy in G<sup>u</sup> (it has to be defined in G<sup>W</sup> , but this can done straightforwardly). Note that this is only possible because the strategy s<sup>A</sup> is locally optimal (due to the definition of Gu). For a Player B strategy s<sup>B</sup> in Gu, consider what happens with strategies s<sup>A</sup> and s<sup>B</sup> in both games G<sup>u</sup> and G. Either

**Fig. 8.** The depiction of a PI concurrent game with its value areas. **Fig. 9.** The PI concurrent game after the modifications described above.

the game stays indefinitely in Qu, and what happens in G<sup>u</sup> and G is identical. Or it eventually leaves Qu, leading to states of value 1 in Gu. Hence, the value of the game G<sup>u</sup> from q with strategies s<sup>A</sup> and s<sup>B</sup> is at least the value of the game G from q with the same strategies. Thus, the value of the state q is positive in Gu.

As it turns out, Lemma 5 suffices to deal with both issues **1** and **2** at the same time. Indeed, as stated in Theorem 3 below, it is a general result that in a PI concurrent game, if all states have positive values, then all states have value 1 and there is a subgame almost-surely winning strategy.

**Theorem 3.** *Consider a PI concurrent game* G *and assume that all state values are greater than or equal to* c > 0*, i.e. for all* q ∈ Q*,* χG(q) ≥ c*. Then, there is a subgame almost-surely winning strategy in* G*.*

*Remark 1.* This theorem can be seen as a strengthening of Theorem 1 from [6]. Indeed, this Theorem 1 states that if all states have positive values, then they all have value 1 (this is then generalized to games with countably-many states). Theorem 3 is stronger since it ensures the existence of (subgame) almost-surely winning strategies. Although a detailed proof is provided in the complete version of this paper [5], note that this theorem was already stated and proven in [14] in the context of PI turn-based games. Nevertheless their arguments could have been used *verbatim* for concurrent games as well. In [5], we give a proof using the same construction (namely, reset strategies) but we argue differently why the construction proves the theorem.

We can now glue together pieces of strategies s<sup>u</sup> <sup>A</sup> defined in all games G<sup>u</sup> into a single strategy sA[(s<sup>u</sup> <sup>A</sup>)<sup>u</sup>∈V<sup>G</sup> \{0}]. Informally, the glued strategy mimics the strategy on Q<sup>+</sup> <sup>u</sup> and switches strategy when a value area is left and another one is reached.

**Definition 17 (Gluing strategies).** *Consider a PI concurrent game* G *and for all values* <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \ {0}*, a strategy* <sup>s</sup><sup>u</sup> <sup>A</sup> *in the game* Gu*. Then, we glue these* *strategies into the strategy* sA[(s<sup>u</sup> <sup>A</sup>)<sup>u</sup>∈V<sup>G</sup> \{0}] : <sup>Q</sup><sup>+</sup> → D(A) *simply written* <sup>s</sup><sup>A</sup> *such that, for all* <sup>ρ</sup> *ending at state* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*:*

$$\mathsf{cs}\_{\mathsf{A}}(\rho) := \begin{cases} \mathsf{s}\_{\mathsf{A}}^{u}(\pi) & \text{if } u = \chi\_{\mathcal{G}}(q) > 0 \text{ for } \pi \text{ the longest } u \text{iffi } x \text{ of } \rho \text{ in } Q\_{u}^{+}\\ \text{is } \operatorname{arbitrary} & \text{if } \chi\_{\mathcal{G}}(q) = 0 \end{cases}$$

As stated in Lemma 6 below, the construction described in Definition 17 transfers almost-surely winning strategies in G<sup>u</sup> into a subgame optimal strategy in G.

**Lemma 6.** *For all* <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \{0}*, let* <sup>s</sup><sup>u</sup> <sup>A</sup> *be a subgame almost-surely winning strategy in* <sup>G</sup>u*. The glued strategy* <sup>s</sup>A[(s<sup>u</sup> <sup>A</sup>)<sup>u</sup>∈V<sup>G</sup> \{0}]*, denoted* sA*, is subgame optimal in* G*.*

*Proof (Sketch).* We apply Theorem 1. First, the strategy s<sup>A</sup> is locally optimal in all <sup>Q</sup><sup>u</sup> for u > 0 by the strategy restriction done to define the game <sup>G</sup><sup>u</sup> (only optimal strategies are considered at each game in normal form <sup>F</sup>nf <sup>q</sup> at states <sup>q</sup> <sup>∈</sup> <sup>Q</sup>u). Furthermore, any strategy is optimal in a game in normal form of value 0 (which is the case of the game in normal forms of states in Q0). Second, if the game eventually settles in a value area Q<sup>u</sup> for some u > 0, from then on the strategy s<sup>A</sup> mimics the strategy s<sup>u</sup> <sup>A</sup>, which is subgame almost-surely winning in <sup>G</sup>u. Hence, the probability of <sup>W</sup> given that the game eventually settles in <sup>Q</sup><sup>u</sup> is 1. This holds for all <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \ {0}, so the second condition of Theorem <sup>1</sup> holds.

We now have all the ingredients to prove Theorem 2.

*Proof (Of Theorem 2).* We consider the PI concurrent game G and assume that there is a positively-optimal strategy that is locally optimal. Then, by Lemma 5, for all positive values <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \{0}, all states in <sup>G</sup><sup>u</sup> have positive values. It follows, by Theorem 3, that there exists a subgame almost-surely winning strategy in every game <sup>G</sup><sup>u</sup> for <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \ {0}. We then obtain a subgame optimal strategy by gluing these strategies together, given by Lemma 6.

The second part of the theorem, dealing with transfer of positionality from subgame almost-surely winning to subgame optimal follows from the fact that if all strategies s<sup>u</sup> <sup>A</sup> are positional for all <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \ {0}, then so is the glued strategy sA[(s<sup>u</sup> <sup>A</sup>)<sup>u</sup>∈V<sup>G</sup> \{0}].

We now apply the result of Theorem 2 to two specific classes of objectives: B¨uchi and co-B¨uchi objectives. Note that this result is already known for B¨uchi objectives, proven in [4].

**Corollary 1.** *Consider a concurrent game with a B¨uchi (resp. co-B¨uchi) objective and assume that there is a positively-optimal strategy that is locally optimal. Then there is a subgame optimal positional strategy.*

Note that it is also possible to prove a memory transfer from subgame almostsurely winning to subgame optimal for an arbitrary memory skeleton, instead of only positional strategies. This adds only a few minor difficulties.

**Application to the turn-based setting.** The aim of Section 6 was to extend an already existing result on turn-based games in the context of concurrent games. This required an adaptation of the assumptions. However, it is in fact possible to retrieve the original result on turn-based games from Theorem 2 in a fairly straightforward manner. It amounts to show that, in all finite turn-based games G, for all values u ∈ V<sup>G</sup> \ {0}, there is a locally optimal strategy that is positively-optimal from all states in Qu.

## **7 Finite-choice strategies**

In this section, we introduce a new kind of strategies, namely finite-choice strategies. Let us first motivate why we consider such strategies. Consider again the co-B¨uchi game of Figure 1. Recall that the optimal strategy we described first plays the top row with increasing probability and the middle row with decreasing probability and then, once Player B plays the second column, switches to a positional strategy playing the bottom row with positive, yet small enough probability. Note that switching strategy is essential. Indeed, if Player A does not switch, Player B could at some point opt for the middle column and see indefinitely the state q- <sup>1</sup> with very high probability. In fact, what happens in that case is rather counter-intuitive: once Player B switches, there is infinitely often a positive probability to reach the outcome of value 1. However, the probability to ever reaching this outcome can be arbitrarily small, if Player B waits long enough before playing the middle row. This happens because the probability ε<sup>k</sup> to visit that outcome goes (fast) to 0 when k goes to ∞. In fact, such an optimal strategy has "infinite choice" in the sense that it may prescribe infinitely many different probability distribution.

In this section, we consider *finite-choice strategies*, i.e. strategies that can use only finitely many GF-strategies at each state.

**Definition 18 (Finite-choice strategy).** *Let* G *be a concurrent game. A Player* <sup>A</sup> *strategy* <sup>s</sup><sup>A</sup> *in* <sup>G</sup> *has* finite choice *if, for all* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*, the set* <sup>S</sup>sA <sup>q</sup> := {sA(ρ · q) | <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>}⊆D(Aq) *is finite.*

Note that positional (even finite-memory) and deterministic strategies are examples of finite-choice strategies.

Interestingly, we can link finite-choice strategies with the existence of subgame optimal strategies. In general it does not hold that if there are optimal strategies, then there exists subgame optimal strategies (as exemplified in the game of Figure 1). However, in Theorem 4 below, we state that if we additionally assume that the optimal strategy considered has finite choice, then there is a subgame optimal strategy (that has also finite choice).

**Theorem 4.** *Consider a PI concurrent game* G*. If there is a finite-choice optimal strategy, then there is a finite-choice subgame optimal strategy.*

*Proof (Sketch).* Consider such an optimal finite-choice strategy sA. In particular, note that there is a constant c > 0 such that for all <sup>ρ</sup> · <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>, for all <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>q</sup> we have: sA(ρ · q)(q) > 0 ⇒ sA(ρ · q)(q) ≥ c. We build a subgame optimal strategy s- <sup>A</sup> in the following way: for all ρ = ρ- · <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>, if the residual strategy <sup>s</sup> ρ <sup>A</sup> is optimal, then s- <sup>A</sup>(ρ) := sA(ρ), otherwise s- <sup>A</sup>(ρ) := sA(q) (i.e. we reset the strategy). Straightforwardly, the strategy s- <sup>A</sup> has finite choice. We want to apply Theorem 1 to prove that it is subgame optimal. One can see that it is locally optimal (by the criterion chosen for resetting the strategy). Consider now some <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> ending at state q ∈ Q and another state q- ∈ Q. Assume that the residual strategy s ρ <sup>A</sup> is optimal but that the residual strategy s ρ·q- <sup>A</sup> is not. Then, similarly to why local optimality is necessary for subgame optimality (see Proposition 1), one can show that any Player B action b leading to q from ρ with positive probability is such that χG(q) < outFnf <sup>q</sup> (sA(ρ), b). Hence, there is positive probability from ρ, if Player B opts for the action b, to reach a state of value different from u = χG(q). And if this happens infinitely often, a state of value different from u will be reached almost-surely<sup>3</sup>. In other words, if a value area is never left, almost-surely, the strategy s- <sup>A</sup> only resets finitely often.

Consider now some <sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>, a Player <sup>B</sup> deterministic strategy <sup>s</sup><sup>B</sup> and a value <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> \{0}. From what we argued above, the probability of the event <sup>Q</sup><sup>∗</sup> ·(Qu)<sup>ω</sup> (resp. <sup>W</sup> <sup>∩</sup>Q<sup>∗</sup> ·(Qu)<sup>ω</sup>) is the same if we intersect it with the fact that the strategy s- <sup>A</sup> only resets finitely often. Furthermore, if the strategy does not reset anymore from some point on, and all states have the same value u > 0, then it follows that the probability of W is 1 (since W is PI). We can then conclude by applying Theorem 1.

Finite-choice strategies are interesting for another reason. In the previous section, we applied the memory transfer from Theorem 2 to the B¨uchi and co-B¨uchi objectives. We did not apply it to other objectives – in particular to the parity objective. Indeed, in general, contrary to the case of turn-based games, infinite-memory is necessary to be almost-surely winning in parity games. This happens in Figure 2 (already described in [12]) where the objective of Player A is to see q<sup>1</sup> infinitely often, while seeing q<sup>2</sup> only finitely often. Let us describe a Player A subgame almost-surely winning strategy. The top row is played with probability 1 − ε<sup>k</sup> and the bottom row is played with probability ε<sup>k</sup> > 0 with ε<sup>k</sup> going to 0 when k goes to ∞ (the (εk) used in the game in Figure 1 works here as well) where k denotes the number of times the state q<sup>0</sup> is seen. Such a strategy is subgame almost-surely winning and does not have finite choice. In fact, it can be shown that all Player A finite-choice strategies have value 0 in that game.

Interestingly, the transfer of memory of Theorem 2 is adapted in Theorem 5 with the memory that is sufficient in turn-based games – for those PI objectives that have a "neutral color"– if we additionally assume that the subgame optimal strategy considered has finite choice. First, let us define what is meant by "neutral color", then we define the turn-based version of PSAW.

<sup>3</sup> This holds because the strategy s<sup>A</sup> has finite choice: the probability to see a state of different value is bounded below by the product of c and the smallest positive probability among all Nature states.

**Definition 19 (Objective with a neutral color).** *Consider a set of colors* K *and a PI objective* W <sup>⊆</sup> <sup>K</sup><sup>ω</sup>*. It has a* neutral color *if there is some (neutral) color* <sup>k</sup> <sup>∈</sup> <sup>K</sup> *such that, for all* <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup><sup>0</sup>·ρ<sup>1</sup> ···∈ <sup>K</sup><sup>ω</sup>*, we have* <sup>ρ</sup> <sup>∈</sup> <sup>W</sup> <sup>⇔</sup> <sup>ρ</sup><sup>0</sup>·k·ρ<sup>1</sup>·<sup>k</sup> ···∈ <sup>W</sup>*.*

**Definition 20 (PASW objective in turn-based games).** *Consider a PI objective* W <sup>⊆</sup> <sup>K</sup><sup>ω</sup>*. It is* positionally subgame almost-surely winnable in turn-based games *(*PSAWT *for short) if in all turn-based games* <sup>G</sup> <sup>=</sup> C, W *where there is a subgame almost-surely winning strategy, there is a positional one.*

**Theorem 5.** *Consider a* PSAWT *PI objective* <sup>W</sup> <sup>⊆</sup> <sup>K</sup><sup>ω</sup> *with a neutral color and a concurrent game* <sup>G</sup> *with objective* W*. Assume there is a subgame optimal strategy that has finite choice. Then, there is a positional one.*

*Proof (Sketch).* A finite-choice strategy s<sup>A</sup> plays only among a finite number of GF-strategies at each state. The idea is therefore to modify the game G<sup>u</sup> of the previous subsection into a game G- <sup>u</sup> by transforming it into a (finite) turn-based game. At each state, Player A chooses first her GF-strategy. She can choose among only a finite number of them: she has at her disposal, at a state q, only optimal GF-strategies in SsA <sup>q</sup> (recall Definition 18). We consider the objective <sup>W</sup> in that new arena where Player B states are colored with a neutral color. The existence, in G, of a subgame optimal strategy that has finite choice ensures that all states in G- <sup>u</sup> have positive values. We can then conclude as for Theorem 2: a subgame optimal strategy can be obtained by gluing together subgame almostsurely winning strategies in the (turn-based) games G- <sup>u</sup> (that can be chosen positional by assumption).

As an application, one can realize that the parity, mean-payoff and generalized B¨uchi objectives have a neutral color and are PSAWT ([11,16,7]). Hence, for these objectives, if there exists an optimal strategy that has finite choice, then there is one that is positional.

**Corollary 2.** *Consider a concurrent game* G *with a parity (resp. mean-payoff, resp. generalized B¨uchi) objective. Assume that there is an optimal strategy that has finite choice in* G*. Then, there is a positional one.*

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Author Index**

#### **A**

Ahman, Danel 1 Attie, Paul C. 520

#### **B**

Baumann, Pascal 240 Bernardo, Marco 265 Boker, Udi 371 Bordais, Benjamin 541 Bouyer, Patricia 541

**C** Chen, Zhibo 68 Cocke, William L. 520

#### **D**

D'Alessandro, Flavio 240 de Amorim, Pedro H. Azevedo 89 Douéneau-Tabot, Gaëtan 436 Dubut, Jérémy 308

#### **E**

Echahed, Rachid 135 Echenim, Mnacho 135

**G**

Ganardi, Moses 240 Goncharov, Sergey 46 Groote, Jan Friso 413

#### **H**

Hainry, Emmanuel 156 Hefetz, Guy 371 Henzinger, Thomas A. 349 Hirschkoff, Daniel 24 Hofmann, Dirk 46 Hojjat, Hossein 413 Holík, Lukáš 392

**I**

Ibarra, Oscar 240

**J** Jaber, Guilhem 24

#### **K**

Kupke, Clemens 328

#### **L**

Labbaf, Faezeh 413 Le, Quang Loc 477 Le, Xuan-Bach D. 477 Licata, Daniel R. 113 Lopez, Aliaume 456

#### **M**

Mazowiecki, Filip 196 Mazzocchi, Nicolas 349 McQuillan, Ian 240 Mhalla, Mehdi 135 Mousavi, Mohammad Reza 413

#### **N**

New, Max S. 113 Nora, Pedro 46

#### **P**

Péchoux, Romain 156 Peltier, Nicolas 135 Pfenning, Frank 68 Prakash, Aditya 218 Prebet, Enguerrand 24

© The Editor(s) (if applicable) and The Author(s) 2023 O. Kupferman and P. Sobocinski (Eds.): FoSSaCS 2023, LNCS 13992, pp. 561–562, 2023. https://doi.org/10.1007/978-3-031-30829-1

#### **R**

Rady, Amgad 285 Rossi, Sabina 265 Rot, Jurriaan 328 Roux, Stéphane Le 541

#### **S**

Saraç, N. Ege 349 Schoen, Ezra 328 Schröder, Lutz 46 Schütze, Lia 240 Síˇc, Juraj 392 Silva, Mário 156 Sinclair-Banks, Henry 196 Starchak, Mikhail R. 176

#### **T**

Thejaswini, K. S. 218 Turkenburg, Ruben 328 Turoˇnová, Lenka 392

#### **V**

van Breugel, Franck 285 van Glabbeek, Rob 498 Vojnar, Tomáš 392

#### **W**

W˛egrzycki, Karol 196 Wild, Paul 46 Wißmann, Thorsten 308

#### **Z**

Zetzsche, Georg 240